Building a Prompt Engineering IDE (VS Code for AI)

Software Engineers have dedicated tools to develop code and Prompt Engineers should have dedicated tools to craft prompts

Toni Engelhardt

Jun 6, 2023 · 8 min read

With the release of OpenAI's ChatGTP in November 2022, we have entered a new era of product development and work in general. Thanks to the superhuman capabilities of Large Language Models (LLMs), many digital text-based tasks can now be partially or even fully automated.

A recent study found that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs and Generative AI.

Just ask AI to do it, right?

Well, not so fast!

The latest language models can indeed take many laborious tasks off our hands, but we need to ask the right questions, and that is why the field of Prompt Engineering is so hot right now.

Let's take a look.

Content

Prompt Engineering
- What is Prompt Engineering?
- Which tasks can be automated with LLMs?
  - 1. One-time tasks
  - 2. Repetitive tasks
Prompt Engineering IDEs
- What is a Prompt Engineering IDE?
- Why you should use a Prompt Engineering IDE
  - Proper tooling for prompt design
  - Access to multiple Inference APIs and LLMs
Promptmetheus

Prompt Engineering

The art of talking to machines.

What is Prompt Engineering?

Prompt Engineering is the process of designing effective prompts to obtain desired outputs from a language model. Effective refers in this context to the ability to generate satisfying completions fast, reliably, and as cheap as possible.

Let's talk about tasks first to define the scope.

Which tasks can be automated with LLMs?

We need to distinguish on a high level between the following two types of tasks: 1) one-time tasks and 2) repetitive tasks.

1. One-time tasks

Examples

Retrieving specific information from the internet, e.g. "What happened in 1971?"
Editing documents, e.g. "help me complete this post section"

One-time tasks refer to standalone queries or questions where we need an answer only once. These tasks are best suited for conversational (or chat-based) interfaces (e.g. ChatGPT or Perplexity), where the user can engage in an interactive exchange with the language model. The primary goal is to guide the LLM to provide the desired information by posing questions judiciously.

Question (aka. prompt) precision and response consistency are not the primary concern in interactive mode since the user can ask follow-up questions and request clarifications to improve the completions. We may or may not call this interactive approach Prompt Engineering, but it requires some practice, skill, and creativity to get useful results for the task at hand. Note that this might get easier as the models get better at understanding tasks and logical reasoning.

While not suitable for automation, chat mode unlocks the full potential of LLMs.

2. Repetitive tasks

Examples

Improve copy based on certain criteria, e.g. "rewrite these emails in formal language"
Extract information from incoming documents, e.g. "summarize this list of complaints"
Classify user-generated items, e.g. "determine the sentiment of these reviews"

Repetitive tasks are executed more than once, either periodically or on demand. These tasks require the model to consistently generate a useful output in a single interaction to be viable. Therefore, accuracy and precision are paramount factors when designing prompts for task automation. Crafting these highly reliable prompts is what we usually refer to when we speak of Prompt Engineering and the process has a much higher difficulty level than chat mode, which is why tooling is so important here.

In the following, we'll look deeper into Prompt Engineering for repetitive tasks.

While certainly more limited in scope, prompts that only require a single interaction with an AI model (either via one-shot prompts or agents) yield the potential for automating expensive digital processes and because of that have immense economic value when deployed appropriately.

But how to develop such valuable prompts?

Prompt Engineering IDEs

IDE stands for Integrated Development Environment, a term originally coined in modern software engineering. Since there are so many parallels between software engineering and prompt engineering, it makes a lot of sense to simply adopt the terminology. In fact, both of these fields refer to the development of instructions for microprocessors, only that the latter is non-deterministic and has a higher level of abstraction (or compilation).

What is a Prompt Engineering IDE?

Engineering reliable prompts for deployment in apps, services, integrations, and workflows demand a lot of experimentation and fine-tuning. Simple testing environments like the OpenAI playground are great to get started, but they are not sufficient for advanced use. For that, we need a more sophisticated tool stack. Think VS Code for software engineering or Bloomberg Terminal for stock trading.

Definition by ChatGPT:

A Prompt Engineering IDE (also Prompt IDE or Integrated Prompting Environment) is a software tool that provides a user-friendly interface for creating and running prompts to interact with language models like GPT-4, Claude 3, and Command R+. It assists users in experimenting with different prompts and receiving responses from the language model in real time, making it easier to fine-tune and test various inputs and outputs.

Why you should use a Prompt Engineering IDE

Imagine telling a software engineer to develop the code for an app or website with Microsoft Word. Spoiler: they'd laugh you out of the room...

Why, code is just text? Well, because MS Word doesn't have the right tooling that makes writing code efficient and fun. Likewise, if you go to the OpenAI playground, it's pretty basic. You get a text input and some sliders to adjust model parameters, but that's it. If you try to construct even moderately complex prompts you'll be lost before you even get started.

Proper tooling for prompt design

So what do we need? Which features would the ideal Prompt Engineering IDE need to have? This is of course dependent on the specific project, but here's a list of the features that I think would generally be nice to have:

Composability
Oftentimes you want to reuse instructions, experiment with distinct versions of a paragraph, or re-arrange sections via drag and drop.
Traceability
When playing around with different variations of a prompt, you want to keep track of which prompt generated which output, a history so to speak.
Input Variables
When testing prompts for real-world scenarios, it's handy if you can use variables. For instance, if you have a prompt that uses a person's name in multiple places, it's convenient to define the name as a variable. That way you can easily execute the prompt for different names without having to replace the name everywhere it is used.
Prompt Library
A central place where you can store all your prompts and organize them in projects and folders.
Version Control System
Just like with code (and git), you want to be able to track changes to your prompts and revert to previous versions if necessary.
Test Data Management System
Once the prompt is ready, you want to test its performance on a diverse set of real-world input data to make sure that it is robust.
Automatic Evaluations
Just like in software development, to be able to iterate efficiently on your prompt design, you need automatic evaluations (evals) to make sure you do not break things when you iterate and optimize.
Prompt Deployment Pipeline
This goes a step further, but ideally, once you have a prompt that works, would it not be nice to directly use it in your apps, integrations, and workflows without having to export and integrate it? Prompt endpoints (AIPIs) make this possible (see below).

Access to multiple Inference APIs and LLMs

Aside from the above-mentioned functionality, maybe the most important one at all is the ability to test your prompts with LLMs from different AI platforms (OpenAI, Gemini, Anthropic, Cohere, Hugging Face, Replicate, etc.), without having to copy-paste them manually into different playgrounds. The Prompt IDE should be able to interface with all relevant providers and allow you to compare outputs side by side.

You can read more about this topic in the "Stripe for AI: One Platform, all LLMs" post.

Promptmetheus

Despite high demand, there are no established Prompt IDE solutions on the market yet – the entire discipline of Prompt Engineering is still in its infancy. That's why Promptmetheus was created, an attempt to fill this gap.

Promptmetheus is designed from the ground up to make it easy to compose, test, optimize, and deploy complex prompts to supercharge your apps, integrations, and workflows with the mighty capabilities of artificial intelligence.

Our Prompt IDE has a modular approach to prompt design and allows you to compose prompts by combining different text- and data blocks like Lego bricks (composability). The app keeps automatically track of the entire design process and provides full traceability for how each output was generated and statistics on how each block performs. This way, you can playfully experiment with different prompt configurations and quickly find the best solution for your use case, without having to worry about losing track of your work.

Promptmetheus Prompt Engineering IDE preview

Promptmetheus interface

You can also easily compare outputs from different models (see LLM Index for a list of supported models and their properties) and adjust model parameters (Token Limit, Temperature, Top P, Frequency Penalty, Presence Penalty, etc.) to optimize the results.

Once you have a prompt ready and tested, you can either export it in a variety of formats (.txt, .xls, .json, etc.) or directly deploy it to a bespoke AIPI endpoint or UI interface, where you and your apps and services can conveniently interact with it.

FAQ

What is an AIPI?

An AIPI (AI Programming Interface) is similar to a conventional API, but instead of executing code its endpoints execute prompts on a remote server and mediate interactions with AI platforms and large language models (LLMs).

And this is just the tip of the iceberg. Promptmetheus also features device sync, real-time collaboration for teams, and much more.

To see it in action, take a look at this short (slightly outdated) demo that showcases how to create a prompt that can extract emotions from a journal entry:

I hope you are convinced?!

You can get started for free in the Forge playground or take advantage of the full feature set with the Archery IDE (7-day free trial).

Also don't forget to check out our compilation of "Prompt Engineering Tips & Tricks" to get up to speed fast.

Thanks for reading and stay tuned for exciting updates in the future!

Share this post with your friends

Keep reading...

Apr 01, 2024 · 13 min

Streamlining HR tasks with Large Language Models

A real-world case study on using Prompt Engineering to make HR and People Operations tasks easy as pie

Mar 31, 2024 · 1 min

Selecting the right LLM for a task

New language models enter the stage every other day, each one with their unique strengths and weaknesses

Jul 21, 2023 · 5 min

Stripe for AI: One Platform, all LLMs

Integrating with multiple AI platforms can be tedious and time-consuming, but cross-platform solutions are out there

Jun 7, 2023 · 9 min

Prompt Engineering Tips & Tricks

The hacks and techniques that take your prompting skills to the next level and help you craft reliable prompts for your LLM-powered apps and workflows.