A context window is the maximum amount of input and generated state that a Large Language Model (LLM) can process during one interaction. It is usually measured in tokens and may include system instructions, user messages, retrieved documents, tool definitions, tool results, prior model output, and the requested response.
The context window is a capacity limit, not a guarantee that every included detail will be used equally well. Very long inputs can increase cost and latency and may reduce retrieval or reasoning accuracy. The Lost-in-the-Middle Effect is one example of performance degrading based on where information appears in a long prompt.
Applications should therefore treat the context window as an attention and cost budget. Context engineering selects the most relevant information for each model call, while context compaction reduces accumulated history during long-running conversations or agent tasks.
The usable output length may share the same limit as the input, depending on the model API. Developers should reserve sufficient capacity for the response and for any intermediate reasoning or tool interactions required by the task.
The LLM Knowledge Base is a collection of bite-sized explanations for commonly used terms and abbreviations related to Large Language Models and Generative AI.
It's an educational resource that helps you stay up-to-date with the latest developments in AI research and its applications.