This is an evolving post...
Check back from time to time for the latest tips and tricks around Prompt Engineering and Prompt Optimization.
Content
- Wrap prompt data with markers
- LLMs need output tokens to "think"
- Split up prompts for complex tasks
- Use output primers in your prompts
- Give examples (if necessary)
- Mimic your own thought process
- Experimentation is key
- Use a Prompt Engineering IDE
- Tone and style of the prompt get adopted
- Make it work first, optimize later
- Try different AI platforms and models
- Look for existing prompt templates
- No need to be polite with LLMs
- Language models evolve, keep that in mind
Wrap prompt data with markers
If you need to pass data or context to the LLM, which it should consider to complete a task, it is important to clearly mark where the data begins and where it ends.
Common ways to do this are [ ] brackets and --- dashed lines.
Example
The following is an entry from a micro journal:
---
Some entry here...
---
Extract all emotions that are present in the entry based on the following list:
[ Anger, Anxiety, Awe, Boredom, Calmness, Cheerfulness, Confusion, Contempt, Curiosity, Desire, Disappointment, Disgust, Embarrassment, Enthusiasm, Envy, ... ]
Print only the emotions as a comma-separated list, nothing else:
The "Best practices for prompt engineering with OpenAI" guide also suggests the usage of ### hashes and """ quotes to mark instructions. *** stars are also possible.
Promptmetheus allows you to define prefixes and suffixes for data blocks so that you do not need to mark each data item individually.
LLMs need output tokens to "think"
In May, Andrej Karpathy, founding member of OpenAI and former Head of AI at Tesla, gave a talk at Microsoft Build 2023 titled "State of AI". The talk is a treasure trove of AI wisdom and if you haven't seen it yet, I'd highly recommend you watch it now:
The biggest takeaway from the talk was – at least for me – that LLMs have no intrinsic memory and therefore cannot "think" without generating tokens.
Think about how you solve a problem if someone asks you a question. First, you browse your memory and reason (or "think") inside of your head. Then, when you come up with a solution, you communicate it to whoever was asking. Since the LLM cannot think inside of its head, it needs to print out its chain of thought, which then is fed back into the model as input.
Therefore, if you ask for a short answer, you cannot expect a lot of reasoning to happen. To overcome this limitation, check out the next section.
Split up prompts for complex tasks
To get around the above mentioned limitation, we can use a two-step approach to let the LLM solve complex tasks.
- Reason
- Format
In the first prompt, we ask the LLM to reason step-by-step and come up with a long and detailed answer, including an explanation for how it came to its conclusion. Then, we ask in a second prompt to compress the answer into the output format we need.
Depending on the complexity of the task, it might even make sense to break your prompt down further, into 3, 4, or more requests. Keep in mind though that the more steps you add, the less predictable and reliable the outputs will be. In an automation scenario, I would therefore recommend no more than 2 steps.
Promptmetheus will soon make this two-step approach easy by allowing you to chain prompts together and execute them sequentially through the AIPI endpoints.
Use output primers in your prompts
Output priming refers to the practice of ending your prompt with the beginning of the expected output. You might also encounter this technique under the names "output hint" or "output indicator". The rationale for output primers in prompt engineering this is the following. The LLM's task is always to "complete" the prompt you send. If you end your prompt with a full stop or a question mark, the model has not many constraints on how it constructs its answer. By ending the prompt with an incomplete sentence or a command followed by a colon on the other hand, you narrow down the space of logical first answer tokens dramatically. And since the algorithm completes your prompt one word at a time, the first few generated tokens are significant.
This simple trick can dramatically improve the precision and reliability of your prompts.
Example
Explain the following philosophical razor to a 6 year old.
Razor:
---
What can be asserted without evidence can be dismissed without evidence.
---
Instructions:
---
- Focus on meaning
- Use a maximum of 3 paragraph/s
---
Explanation:
In the above example "Explanation:" is the output primer.
Give examples (if necessary)
The straight forward approach to prompt an LLM is to simply ask a question. This is known as a zero-shot prompt. Zero refers to the number of examples included in the prompt, aka. none. If this approach yields reasonable results for your use case, it is the preferred way to go. Adding examples and thereby increasing the token count (and execution cost) only makes sense if we get a measurable improvement in the quality of the outputs.
If we cannot get good results with a zero-shot prompt though, it is always worth a try to add samples of what we expect as output and see if that makes a difference. A prompt with a single example is called a one-shot prompt and prompts with two or more examples are usually referred to as few-shot prompts.
Example
Extract keywords from the corresponding texts below.
---
Text 1: Stripe provides APIs that web developers can use to integrate payment processing into their websites and mobile applications.
Keywords 1: Stripe, payment processing, APIs, web developers, websites, mobile applications
---
Text 2: OpenAI has trained cutting-edge language models that are very good at understanding and generating text. Our API provides access to these models and can be used to solve virtually any task that involves processing language.
Keywords 2: OpenAI, language models, text processing, API.
---
Text 3: {text}
Keywords 3:
The example above was taken from the OpenAI docs.
The ideal number of examples to include is completely empirical. Too few is bad for output quality and too many is bad for execution cost. With Promptmetheus you can use a dedicated "sample" block and add fragments with a varying number of examples to measure what works best.
Mimic your own thought process
A great approach to prompt design is to mimic your own problem solving flow. How would you solve the task. Introspect your train of thought and break the task down into small tangible sub-tasks, following a causal chain of reasoning. Then direct the LLM through the steps one-by-one.
It helps to generate completions for sub-tasks individually to see if the LLM can solve them in isolation.
To get a better understanding of how such causal chains work, it might be useful for you to study how the human brain works. I can recommend the books "Thinking, Fast and Slow" by Daniel Kahneman and "How to Create a Mind" by Ray Kurzweil in this context.
Experimentation is key
Even if you follow all the best practices of prompt engineering and the tips and tricks in this list, in the end, crafting a good prompt comes down to experimenting and good old trial and error. Start by mimicking your own thought process (see above) to solve the problem at hand. If it doesn't work at all, try a new approach. If it works, make small changes and test if the output gets better or worse. Rinse and repeat. Fine-tune your prompt until you get the desired results.
Note: Large Language Models (LLMs) are non-deterministic by their very nature. Therefore, execute every design iteration multiple times to make sure that the prompt performs well consistently.
Use a Prompt Engineering IDE
Since experimentation is key, appropriate tooling is essential to design and test prompts efficiently. The default OpenAI playground usually doesn't cut it and if experimentation is too tedious and too time-consuming, we tend to not iterate enough and get sub-par results.
This is obviously the point where I have to recommend Promptmetheus, since it was designed specifically for this use case and provides all the tools and functionality that make experimenting with prompts easy and efficient:
- Prompt composability via blocks
- Prompt variables
- Test datasets
- Prompt history and full traceability
- Cost estimation for completions
- Output evaluation and prompt performance statistics
- Real-time collaboration
- Support for all major LLM providers and models (see LLM Index)
You can find a full introduction and more details in the "Building a Prompt Engineering IDE (VS Code for AI)" post, or if you want to give it a spin, try the Archery IDE for free.
Tone and style of the prompt get adopted
Unless instructed otherwise, the LLM will usually adopt the tone and style of the prompt. This is, again, due to the fact that the model is actually not answering questions, but completing the prompt. It will optimize for consistency. Therefore, it's important to prime it with the right tone and style in your prompt.
Make it work first, optimize later
A common mistake is to try to optimize the prompt from the get go. It's usually a better strategy to first find a prompt that works and then optimize it for token count, language, etc.
Try different AI platforms and models
Due to technical or other reasons you might be bound to a specific AI platform or LLM in your project, but if not, it is always worth it to test your prompts on different platforms (OpenAI, Gemini, Anthropic, Cohere, Hugging Face, Replicate, etc.) and with different models.
There is no such thing as a best model or platform for all tasks. Each model has its own strengths and weaknesses and each platform has quirks and/or limitations. Bigger, or more recent models do not always perform better in a specific task. And even if they do, a cheaper, faster model might perform sufficiently well for your use case.
Have a look at our post "How to choose the right LLM for your use case" for a deep dive into the matter.
As new models arrive and existing ones get updated, don't forget to re-assess your choices from time to time.
Also here, I need to recommend Promptmetheus or a similar Prompt IDE, where you can test your prompt with different platforms and models by the click of a single button and compare the results side by side.
Look for existing prompt templates
Don't re-invent the wheel. A lot of the sweat in prompt engineering comes from trial and error and someone might have already solved the problem that you are tackling. There are a lot of prompt libraries out there, browse the web and your favorite search engine for existing solutions that can serve as a starting point for your prompt. Don't forget to experiment and fine-tune though!
No need to be polite with LLMs
LLMs do not (yet) get offended if you omit pleasantries. In fact, they are more likely to get confused by them. So, save your tokens and skip phrases like "please", "if you don't mind", "thank you", "I would like to", etc. and get straight to the point.
Language models evolve, keep that in mind
Most of the latest (and most powerful) LLMs are under active development. This usually means that the performance of the model is getting better on average over time. But that is not always the case and the performance of a model for a specific prompt can decrease dramatically as it is fine-tuned on other tasks. To make sure that your prompts keep working as expected, it's good practice to test and re-evaluate them continuously.
Promptmetheus is currently developing automated evals prompt testing flows to take this tedious task off your hands.
To be continued...
Great, let's take your newly acquired skills for a spin in the Promptmetheus Prompt Engineering IDE.