All posts

Prompt Engineering Tips & Tricks

The hacks and techniques that take your prompting skills to the next level and help you optimize your LLM prompts

Published on June 7, 2023 by
Toni Engelhardt and GPT-4

This is an evolving post. Check back from time to time for the latest tips and tricks around prompt engineering and -optimization as they are discovered by the AI community.

Content

Wrap prompt data with markers

If you need to pass data or context to the LLM, which it should consider to complete a task, it is important to clearly mark where the data begins and where it ends.

Common ways to do this are [ ] brackets and --- dashed lines.

Example

The following is an entry from a micro journal:

---

Some entry here...

---

Extract all emotions that are present in the entry based on the following list:

[ Anger, Anxiety, Awe, Boredom, Calmness, Cheerfulness, Confusion, Contempt, Curiosity, Desire, Disappointment, Disgust, Embarrassment, Enthusiasm, Envy, ... ]

Print only the emotions as a comma-separated list, nothing else:

The "Best practices for prompt engineering with OpenAI" guide also suggests the usage of ### hashes and """ quotes to mark instructions. *** stars are also possible.

PROMPTMETHEUS allows you to define prefixes and suffixes for data blocks so that you do not need to mark each data item individually.

LLMs need output tokens to "think"

In May, Andrej Karpathy, founding member of OpenAI and former Head of AI at Tesla, gave a talk at Microsoft Build 2023 titled "State of AI". The talk is a treasure trove of AI wisdom and if you haven't seen it yet, I'd highly recommend you watch it now:




The biggest takeaway from the talk was โ€“ at least for me โ€“ that LLMs have no intrinsic memory and therefore cannot "think" without generating tokens.

Think about how you solve a problem if someone asks you a question. First, you browse your memory and reason (or "think") inside of your head. Then, when you come up with a solution, you communicate it to whoever was asking. Since the LLM cannot think inside of its head, it needs to print out its chain of thought, which then is fed back into the model as input.

Therefore, if you ask for a short answer, you cannot expect a lot of reasoning to happen. To overcome this limitation, check out the next section.

Split up prompts for complex tasks

To get around the above mentioned limitation, we can use a two-step approach to let the LLM solve complex tasks.

  1. Reason
  2. Format

In the first prompt, we ask the LLM to reason step-by-step and come up with a long and detailed answer, including an explanation for how it came to its conclusion. Then, we ask in a second prompt to compress the answer into the output format we need.

Depeding on the complexity of the task, it might even make sense to break your prompt down further, into 3, 4, or more requests. Keep in mind though that the more steps you add, the less predictable and reliable the outputs will be. In an automation scenario, I would therefore recommend no more than 2 steps.

PROMPTMETHEUS will soon make this two-step approach easy by allowing you to chain prompts together and execute them sequentially through the AIPI endpoints.

Use output primers in your prompts

Output priming refers to the practice of ending your prompt with the beginning of the expected output. You might also encounter this technique under the names "output hint" or "output indicator". The rationale for output primers in prompt engineering this is the following. The LLM's task is always to "complete" the prompt you send. If you end your prompt with a full stop or a question mark, the model has not many constraintss on how it constructs its answer. By ending the prompt with an incomplete sentence or a command followd by a colon on the other hand, you narrow down the space of logical first answer tokens dramatically. And since the algorithm completes your prompt one word at a time, the first few generated tokens are significant.

This simple trick can dramatically improve the precision and reliability of your prompts.

Example

Explain the following philosophical razor to a 6 year old.

Razor:
---
What can be asserted without evidence can be dismissed without evidence.
---

Instructions:
---
- Focus on meaning
- Use a maximum of 3 paragraph/s
---

Explanation:

In the above example "Explanation:" is the output primer.

Give examples (if necessary)

The straight forward approach to prompt an LLM is to simply ask a question. This is known as a zero-shot prompt. Zero refers to the number of examples included in the prompt, aka. none. If this approach yields reasonable results for your use case, it is the preferred way to go. Adding examples and thereby increasing the token count (and execution cost) only makes sense if we get a measurable improvement in the quality of the outputs.

If we cannot get good results with a zero-shot prompt though, it is always worth a try to add samples of what we expect as output and see if that makes a difference. A prompt with a single example is called a one-shot prompt and prompts with two or more examples are usually referred to as few-shot prompts.

Example

Extract keywords from the corresponding texts below.

---
Text 1: Stripe provides APIs that web developers can use to integrate payment processing into their websites and mobile applications.
Keywords 1: Stripe, payment processing, APIs, web developers, websites, mobile applications
---
Text 2: OpenAI has trained cutting-edge language models that are very good at understanding and generating text. Our API provides access to these models and can be used to solve virtually any task that involves processing language.
Keywords 2: OpenAI, language models, text processing, API.
---
Text 3: {text}
Keywords 3:

The example above was taken from the OpenAI docs.

The ideal number of examples to include is completely empirical. Too few is bad for output quality and too many is bad for execution cost. With PROMPTMETHEUS you can use a dedicated "sample" block and add fragments with a varying number of examples to measure what works best.

Mimick your own thought process

A great approach to prompt design is to mimick your own problem solving flow. How would you solve the task. Introspect your train of thought and break the task down into small tangible sub-tasks, following a causal chain of reasoning. Then direct the LLM through the steps one-by-one.

It helps to generate completions for sub-tasks individually to see if the LLM can solve them in isolation.

To get a better understanding of how such causal chains work, it might be useful for you to study how the human brain works. I can recommend the books Thinking, Fast and Slow by Daniel Kahneman and How to Create a Mind by Ray Kurzweil in this context.

Experimentation is key

Even if you follow all the best practices of prompt engineering and the tips and tricks in this list, in the end, crafting a good prompt comes down to experimenting and good old trial and error. Start by mimicking your own thought process (see above) to solve the problem at hand. If it doesn't work at all, try a new approach. If it works, make small changes and test if the output gets better or worse. Rinse and repeat. Fine-tune your prompt until you get the desired results.

Note: Large language models are non-deterministic by their very nature. Therefore, execute every design iteration multiple times to make sure that the prompt performs well consistently.

Use a Prompt Engineering IDE

Since experimentation is key, appropriate tooling is essential to design and test prompts efficiently. The default OpenAI playground usually doesn't cut it and if experimentation is too tedious and too time-consuming, we tend to not interate enough and get sub-par results.

This is obviously the point where I have to recommend PROMPTMETHEUS, since it was designed specifically for this use case and provides all the tools and functionality that make experimenting with prompts easy and efficient:

  • Prompt composeability via blocks
  • Prompt variables
  • Test datasets
  • Prompt history and full traceability
  • Cost estimation for completions
  • Output evaluation and prompt performance statistics
  • Real-time collaboration
  • Support for all major LLM providers and models (see LLM Index)
PROMPTMETHEUS Archery preview
PROMPTMETHEUS Archery platform

You can find a full introduction and more details in the Prompt Engineering IDEs post, or if you want to give it a spin, try the Archery IDE for free.

Tone and style of the prompt get adopted

Unless instructed otherwise, the LLM will usually adopt the tone and style of the prompt. This is, again, due to the fact that the model is actually not answering questions, but completing the prompt. It will optimize for consistency. Therefore, it's important to prime it with the right tone and style in your prompt.

Make it work first, optimize later

A common mistake is to try to optimize the prompt from the get go. It's usually a better strategy to first find a prompt that works and then optimize it for token count, language, etc.

Try different AI platforms and models

Due to technical or other reasons you might be bound to a specific AI platform or LLM in your project, but if not, it is always worth it to test your prompts on different platforms (OpenAI, Bard, Anthropic, Cohere, Hugging Face, Replicate, etc.) and with different models.

Popular AI platforms

There is no such thing as a best model or platform for all tasks. Each model has its own strengths and weaknesses and each platform has quirks and/or limitations. Bigger, or more recent models do not always perform better in a specific task. And even if they do, a cheaper, faster model might perform sufficiently well for your use case.

As new models arrive and existing ones get updated, don't forget to reassess your choices from time to time.

Also here, I need to recommend PROMPTMETHEUS or a similar Prompt IDE, where you can test your prompt with different platforms and models by the click of a single button and compare the results side by side.

Look for existing prompt templates

Don't re-invent the wheel. A lot of the sweat in prompt engineering comes from trial and error and someone might have already solved the problem that you are tackling. There are a lot of prompt libraries out there, browse the web and your favorite search engine for existing solutions that can serve as a starting point for your prompt. Don't forget to experiment and fine-tune though!

No need to be polite with LLMs

LLMs do not (yet) get offended if you omit pleasantries. In fact, they are more likely to get confused by them. So, save your tokens and skip phrases like "please", "if you don't mind", "thank you", "I would like to", etc. and get straight to the point.

Language models evolve, keep that in mind

Most of the latest (and most powerful) LLMs are under active development. This usually means that the performance of the model is getting better on averge over time. But that is not always the case and the performance of a model for a specific prompt can decrease dramatically as it is fine-tuned on other tasks. To make sure that your prompts keep working as expected, it's good practice to test and re-evaluate them continuously.

PROMPTMETHEUS is currently developing automated prompt testing flows to take this tedious task off your hands.



To be continued...



Great, let's take your newly acquired skills for a spin in our Prompt Engineering IDE.


Supported LLMs

Anthropic

Claude 2.1

Claude 2

Claude Instant 1.2

Claude Instant 1

Cohere

Command / Nightly

Command Light

OpenAI

GPT-4 Turbo

GPT-4 / 32k

GPT-3.5 Turbo / 16k

GPT-3.5 Turbo Instruct

DaVinci 003

Curie 001

Babbage 001

Ada 001

Perplexity

Llama 2 70B

Code Llama 34B

Mistral 7B Instruct

OpenHermes 2.5 Mistral 7B

OpenHermes 2 Mistral 7B

pplx 70B chat alpha

pplx 7B chat alpha

PaLM 2

Text Bison

Chat Bison

Code Bison

NLP Cloud

Chat Dolphin

Dolphin

Aleph Alpha

Luminous Supreme

Luminous Extended

Luminous Base

AI21 Labs

Jurassic 2 Ultra

Jurassic 2 Mid

Jurassic 2 Light

Deep Infra

Llama 2 70B Chat HF

Llama 2 13B Chat HF

Llama 2 7B Chat HF

Mistral 7B Instruct v0.1

xAI

coming soon...

Hugging Face

coming soon...

Replicate

coming soon...

Azure

coming soon...

Bedrock

coming soon...

Custom

coming soon...

Local

coming soon...