LLM Knowledge Base

Bite-sized explanations for commonly used terms and abbreviations related to Large Language Models (LLMs) and Generative AI.


Artificial Intelligence (AI)

Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. This can include learning from experiences, reasoning through problems, understanding complex data, engaging in Natural Language Processing, and adapting to new inputs. AI systems can range from simple, rule-based algorithms to advanced neural networks and deep learning models. The goal of AI is to enable machines to perform tasks that would typically require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. AI is a foundational technology in various fields, including robotics, autonomous vehicles, and the software industry, where it powers innovations like chatbots, recommendation systems, and predictive analytics.

AI agent

In the context of Generative AI, "agent" typically refers to a software entity or system that uses generative models to perform tasks, engage in conversations, or provide services in a human-like or autonomous manner. These agents can encompass a wide range of applications, and their behavior is driven by the underlying generative AI model/s (see prompt chaining). Examples of AI agents include chatbots, virtual assistants, game NPCs, etc.

AI Programming Interface (AIPI)

An AIPI is similar to a conventional API (Application Programming Interface), but instead of executing code its endpoints execute prompts on a remote server and mediate interactions with AI platforms and LLMs.



A chatbot is an Artificial Intelligence software application designed to simulate conversation with human users, typically over the internet. It uses Natural Language Processing and Machine Learning algorithms to understand and respond to user queries, providing information, assistance, or performing specific tasks through text or voice-based interactions. Chatbots are commonly used in customer service, support systems, and as virtual assistants, offering a scalable and cost-effective way for businesses to engage with customers and improve user experience.

Context window

Context Window refers to a specific range of words or data points that an AI model considers when predicting the next output. It's a crucial aspect of language models, where the context window helps the model understand the semantic relationship between words or phrases in a given text. The size of the context window can greatly influence the model's performance, as a larger context window provides more information for the model to base its predictions on, but may also increase computational complexity.

The size of the context window is often a limiting factor for non-trivial LLMs applications.

Over time, as LLMs become more capable, the context window size will likely keep increasing. It took only a couple of months to increase from 2,048 (GPT-3) to 100,000 (Claude 2) tokens and most recently to 200,000 (Claude 2.1).


Data loader

Data loaders are software components or utilities that facilitate the process of loading data into a system or database. They are designed to handle large volumes of data, ensuring efficient and accurate data transfer.

In the context of Generative AI, data loaders are used to feed data into the AI models during the training process, often providing features like batching, shuffling, and parallel data loading to optimize the training procedure.

In the context of Prompt Engineering tools and IDEs, data loaders are used to inject external data into a prompt, e.g. from a database, an API, or from the internet.



Embeddings in the context of Generative AI refer to the mathematical representation of complex data types, such as words, sentences, or even entire documents, in a high-dimensional space. These representations, often in the form of vectors, capture the semantic or contextual meaning of the data. Embeddings are a crucial part of many Machine Learning models, as they allow these models to understand and process data in a more human-like way. They are particularly useful in Natural Language Processing, recommendation systems, and other areas where understanding the relationships between different pieces of data is important.


Few-shot prompt

Few-shot prompt refers to a method where the AI model is given a small number of examples (usually less than 10) during the inference stage to understand and generate similar content. This approach helps the model to quickly adapt to new tasks, reducing the need for extensive training data. It's a technique commonly used in Natural Language Processing and Machine Learning models to improve their performance and versatility.

See also zero-shot prompt and one-shot prompt.


Fine-tuning refers to the process of taking a pre-trained model and further training it on a specific dataset to enhance its performance. This technique is used to adapt a general-purpose model to a specific task or to improve its ability to understand specific nuances, contexts, or languages. Fine-tuning is a crucial step in Machine Learning and AI development, as it allows for more accurate and efficient models by leveraging existing neural networks and reducing the need for extensive training from scratch.

Foundation model

A Foundation Model is a type of artificial intelligence model that is pre-trained on a large amount of data and serves as a base for building more specific AI models. These models, due to their extensive training, have a broad understanding of language, images, or other types of data, and can be fine-tuned to perform specific tasks in the field of Generative AI, such as text generation, image synthesis, and more. Foundation models are key to many SaaS products in the AI domain due to their versatility and efficiency.

Frequency Penalty

Frequency Penalty is a parameter used in Generative AI models, particularly in language models, to control the repetition of generated content. It penalizes the model for repeatedly generating the same words or phrases, thereby encouraging diversity and novelty in the output. This parameter is adjustable, allowing users to balance between repetitiveness and creativity according to their specific needs.

For some models the frequency penalty can also be negative, which encourages repetition.


Generative AI

Generative AI is a subset of artificial intelligence that leverages Machine Learning techniques to enable a system to generate new, previously unseen content, data, or information that is similar to or a variation of its training data. This can include a wide range of outputs, such as text, images, music, or even voice. Generative AI is commonly used in a variety of applications, such as content creation, data augmentation, and predictive modeling.



Inference refers to the process of using a trained model to make predictions or decisions. It involves inputting new, unseen data into the model and receiving an output that represents the model's best guess or prediction. This process is crucial in many AI applications, such as image recognition, natural language processing, and recommendation systems, where the model's ability to infer or predict outcomes based on learned patterns is utilized.

Input tokens

Input tokens refer to the individual units of information that are fed into an AI model as part of the prompt for processing. These tokens can be words, characters, or subwords, depending on the granularity of the model. They serve as the basis for the model to understand, analyze, and generate output. The number of input tokens often determines the computational requirements of the AI model, as more tokens typically require more processing power.

Input tokens are subject to the total (input + output) or specific (input only) token limit.

Integrated Prompt Engineering Environment (IPEE)

An IPEE is the analog to a code IDE in the world of prompt engineering. It's a software application that provides pro-grade tooling to develop reliable prompts more efficiently. PROMPTMETHEUS is an example of an IPEE.

Take a look at the "Building a Prompt Engineering IDE (VS Code for AI)" post for a more details.



In the context of Generative AI, "jailbreak" refers to the process of removing or circumventing restrictions imposed by the LLMs original developers. This allows users to gain access to additional functionalities, customization options, or the ability to run unauthorized or third-party software that is not typically permitted within the standard operating parameters. Jailbreaking in the traditional sense is associated with smartphones and other devices, but in the realm of Generative AI, it involves modifying the behavior of AI models to perform tasks or operate in ways not originally intended by their creators.

See also Prompt injection attack.


Large Language Models (LLM)

Large Language Models (LLM) are advanced Machine Learning algorithms that are trained on a vast amount of text data. They are designed to understand and generate human-like text based on the input they receive. LLMs are capable of tasks such as translation, question answering, and text generation, among others. They are a significant component of natural language processing and are instrumental in the development of AI applications that require a deep understanding of human language.


A Logit refers to the raw, unnormalized output values produced by a classification model. These values are typically transformed through a function like the softmax function to produce probabilities. The term "logit" comes from logistic regression, where it refers to the log-odds of a probability. In deep learning, logits have a significant role in various loss functions and layers.


Machine Learning (ML)

Machine learning is a subset of artificial intelligence (AI) that enables software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. It involves the use of algorithms and statistical models to allow computers to learn from and make decisions based on data. Machine learning encompasses a range of techniques and methodologies, such as supervised learning, unsupervised learning, and reinforcement learning, each with specific applications and use cases. By analyzing and identifying patterns in large datasets, machine learning systems can adapt and improve their performance over time, making it a foundational technology for many predictive and analytical applications in various industries.


Natural Language Processing (NLP)

Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and humans through natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of the human language in a valuable way. It is used to analyze, understand, and derive meaning from human language in a smart and useful way, and is at the core of various applications like voice assistants, translation services, chatbots, and sentiment analysis tools.


One-shot prompt

One-shot prompt refers to a single input or instruction given to an AI model, which then generates a comprehensive output based on that single prompt. The term "one-shot" signifies that the AI model doesn't require multiple examples or extensive training to produce the desired output, but instead, it can understand and respond to the task effectively with just one prompt. This concept is particularly relevant in language models where a one-shot prompt can generate a complete text or answer a question.

See also zero-shot prompt and few-shot prompt.

PROMPTMETHEUS is specifically designed to forge these highly effective one-shot prompts.

Open-weights model

An "open-weights" model is a machine learning model whose parameters (or "weights") are publicly accessible and can be used or modified without restriction. Unlike closed or proprietary models, open-weights models are often shared within the AI community for research, educational purposes, or to foster innovation. They can be fine-tuned or adapted to specific tasks and contribute to the transparency and collaborative advancement of AI technology. Open-weights models can also facilitate reproducibility in AI research, allowing others to validate and build upon existing work.

Output tokens

Output tokens refer to the individual units of information produced by an AI model after processing the input data. They collectively form the generated text, code, etc., or what's known as the completion.

Output tokens are subject to the total (input + output) or specific (output only) token limit.



In the context of Generative AI, a Playground refers to an interactive online platform or environment where users can experiment, learn, and test AI models. It allows users to manipulate parameters, input data, and observe the output generated by the LLM, thereby providing a hands-on experience of how the AI system works. This tool is particularly useful for understanding the capabilities and limitations of AI models, and for fine-tuning them to achieve desired results.


A prompt is an initial input or instruction given to an AI model, which guides the AI in generating subsequent content or responses. It can be a single word or up to multiple paragraphs that set the direction for the AI's output. Prompts are crucial in tasks such as text generation, language translation, and conversation simulation, as they provide a starting point for the AI's creative process.

Prompt chaining

Prompt chaining refers to the process of using the output from one LLMs completion as the input or prompt for another completion in a sequence. This technique allows for the creation of more complex and nuanced responses, as each subsequent completion can build upon and refine the previous one. It's often used in Natural Language Processing tasks, such as text generation, to improve the coherence and relevance of the generated content. There is in theory no limit on how many LLM completions can be chained together and it is possible to use different models for each completion.

Prompt Engineering

Prompt Engineering refers to the process of designing and optimizing prompts to effectively communicate with an AI model. It involves crafting inputs in a way that guides the AI to produce the desired output. This process is crucial in leveraging the full potential of AI models, as the quality and relevance of the output largely depend on the prompt's structure, context, and clarity.

Prompt IDE

A Prompt IDE (Integrated Development Environment) is a specialized software tool designed for developers and researchers working with Generative AI models, particularly those that utilize Natural Language Processing. It provides a user-friendly interface for crafting, refining, and testing prompts that are used to elicit specific responses or outputs from AI models like GPT-4 or Claude 2. A Prompt IDE typically includes features such as variables, testing environments, version control, and performance analytics to help users optimize their prompts for accuracy and creativity, thereby enhancing the interaction with AI systems. A Prompt IDE is essential for anyone looking to streamline the development process of AI-driven applications, ensuring that the prompts used are effective and produce the desired results.

Take a look at the "Building a Prompt Engineering IDE (VS Code for AI)" post for a more details and sign up for PROMPTMETHEUS to give it a spin.

Prompt injection attack

A Prompt Injection Attack is a type of cybersecurity threat specific to systems utilizing Generative AI, particularly those that generate content based on user inputs, such as chatbots or AI writing assistants. In this attack, a malicious user crafts input prompts in a way that manipulates the AI into generating responses that include sensitive data, unintended actions, or biased content. This can compromise the integrity of the AI system, lead to data breaches, or cause the AI to behave in undesirable ways. Prompt Injection Attacks exploit the vulnerabilities in the AI's language understanding or processing capabilities to deceive the system into deviating from its intended function. Protecting against such attacks involves implementing robust input validation, monitoring, and AI training to recognize and resist malicious inputs.

Presence Penalty

Presence Penalty is a parameter used in Generative AI models to control the repetition of certain phrases or words in the generated text. A higher presence penalty discourages the model from using the same phrases or words frequently, thereby promoting diversity and novelty in the output. This parameter helps in fine-tuning the model's output to meet specific requirements and improve the overall quality of the generated content.


Reinforcement Learning from Human Feedback (RLHF)

RLHF is a Machine Learning approach that combines reinforcement learning (RL) with human feedback to train models, particularly in scenarios where it is challenging to define a clear reward function. In RLHF, a model learns to perform tasks by receiving signals or corrections from human supervisors, which guide the model towards desired behaviors. This method is often used in Generative AI to ensure that the output aligns with human values and preferences, improving the quality and safety of the AI's decisions or creations. RLHF can be particularly useful in domains like Natural Language Processing, where nuanced understanding and context are crucial.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation is a methodology used in the field of Generative AI. It combines the benefits of both retrieval-based and generative systems for language processing. RAG utilizes an external knowledge source to retrieve relevant documents or information, and then uses a generative model to create a contextually appropriate response or output. This approach enhances the model's ability to generate detailed, accurate, and context-specific responses, making it particularly useful in applications such as chatbots, question-answering systems, and content creation tools.


System message

In the context of LLMs, a "system message" refers to a message or prompt used to set the context or guide the behavior of the model during a conversation. It is typically the first message in a dialogue and is used to instruct the LLM or establish the role it should play in the conversation. System messages help steer the interaction by providing context or specifying the desired format of the response, and they are important for shaping the LLM's behavior, ensuring it understands its role (e.g., as a tutor, chatbot, or information provider), and generating responses that align with the user's expectations.



In the context of Generative AI, "Temperature" refers to a parameter in the probability distribution function used during the generation process. It controls the randomness of predictions by scaling the logits before applying softmax. A high temperature value results in more random outputs, while a low temperature value makes the model's outputs more deterministic and similar to the training data. It's a crucial factor in balancing between diversity and quality of generated content.

In an analog to human cogitation one could compare the temperature value to the level of creativity, where a high temperature value would result in more creative outputs, while a low temperature value would result in more objective outputs.


A token is the smallest unit of data that a model can understand and process. It can be as short as a single character or as long as a word, depending on the language and the specific model. Tokens are used to break down input data into manageable pieces, enabling the AI to analyze, understand, and generate text.

For most state-of the art models, one token can be statistically approximated as ~4 characters.


Tokenization refers to the process of converting a sequence of text into individual pieces, known as tokens. These tokens can represent words, characters, or subwords, and serve as the basic units for understanding and processing text in Natural Language Processing (NLP) tasks. Tokenization helps AI models to better understand, analyze, and generate human language by breaking down complex sentences into simpler, manageable components.

Token limit

Token limit refers to the maximum number of tokens that a Generative AI model can process or generate in a single operation. The token limit is a crucial factor in determining the complexity and length of the content that the AI can handle (see also context window).

Different LLM providers treat token limits differently. Sometimes the token limit applies to the total number of input tokens plus output tokens, e.g. OpenAI. Other times, limits are applied to input and output independently, e.g. PaLM 2.

Top P

Top P, also known as Nucleus Sampling, is a strategy used in language models for text generation. Instead of choosing the most likely next word in a sequence, the model considers a set of possible next words, known as the "nucleus", based on their cumulative probability exceeding a certain threshold, P. This approach provides a balance between randomness and predictability, resulting in more diverse and realistic text generation.

In the context of tuning model parameters, Top P is very similar to Temperature and it is usually recommended to only adjust one of the two.


A Transformer is a type of artificial intelligence model primarily used in the field of Natural Language Processing (NLP). Introduced in the paper "Attention is All You Need" by Vaswani et al., it revolutionized the NLP domain by using a mechanism called "attention" to understand the context of words in a sentence. Unlike previous models, Transformers do not process data in sequential order, but rather, they process all data points simultaneously, making them highly efficient for large-scale tasks. Transformers serve as the foundation for many advanced AI models, including BERT, GPT-3, and others.


Zero-shot prompt

Zero-shot prompt refers to a situation where an AI model generates an output or completes a task without any prior specific training or examples on that task. The model uses its general understanding from pre-training to generate the response, hence the term 'zero-shot', indicating no additional training shots were given for that specific task. This concept is often used in Natural Language Processing and machine learning models.

See also one-shot prompt and few-shot prompt.

This page is generated and updated with a one-shot prompt forged in the PROMPTMETHEUS IDE and a variety of different language models, i.a. OpenAI GPT-3.5 and GPT-4, Anthropic Claude 2, Cohere Command Nightly, PaLM 2 Chat Bison, and Microsoft Copilot.

Supported LLMs


Claude 2.1

Claude 2

Claude Instant 1.2

Claude Instant 1


Command / Nightly

Command Light


GPT-4 Turbo

GPT-4 / 32k

GPT-3.5 Turbo / 16k

GPT-3.5 Turbo Instruct

DaVinci 003

Curie 001

Babbage 001

Ada 001


Llama 2 70B

Code Llama 34B

Mistral 7B Instruct

OpenHermes 2.5 Mistral 7B

OpenHermes 2 Mistral 7B

pplx 70B chat alpha

pplx 7B chat alpha

PaLM 2

Text Bison

Chat Bison

Code Bison

NLP Cloud

Chat Dolphin


Aleph Alpha

Luminous Supreme

Luminous Extended

Luminous Base

AI21 Labs

Jurassic 2 Ultra

Jurassic 2 Mid

Jurassic 2 Light

Deep Infra

Llama 2 70B Chat HF

Llama 2 13B Chat HF

Llama 2 7B Chat HF

Mistral 7B Instruct v0.1


coming soon...

Hugging Face

coming soon...


coming soon...


coming soon...


coming soon...


coming soon...


coming soon...