Context Compaction

Context compaction is the process of reducing the number of tokens used to represent an ongoing conversation or agent run while preserving the state needed for future steps. It enables long-running interactions to continue without repeatedly sending the complete execution history.

As an AI agent calls tools and collects observations, its context window can fill quickly. Larger inputs increase cost and latency, and excessive context can reduce attention to relevant details. Compaction is one technique within context engineering for controlling that growth.

Compaction methods include:

  • summarizing older messages into a shorter state description;
  • removing superseded plans, duplicate outputs, and low-value observations;
  • replacing large tool results with references to external artifacts;
  • extracting decisions, constraints, and unresolved tasks into structured state; and
  • retaining recent messages verbatim while compressing older history.

The main risk is information loss. A summary may omit a constraint that appeared unimportant earlier but becomes critical later. Repeated summarization can also introduce drift, where each compacted version moves further from the original record.

For this reason, compaction should not be the sole storage mechanism for authoritative information. Exact tool outputs, user approvals, financial values, source documents, and audit records should remain in external storage. Agent memory can preserve durable facts, while the compacted context carries only the working state needed by the model.

A robust compaction format typically records the current objective, completed work, active plan, important constraints, relevant facts with provenance, available artifacts, and unresolved errors. Systems should evaluate whether an agent can resume successfully from the compacted state without access to the full transcript.

Compaction may be performed by application code, a separate model call, or a model-provider feature. Triggering it at a threshold before the context limit leaves room for the compaction operation itself and for subsequent work. OpenAI's compaction guide describes server-side and standalone approaches for preserving state in long-running model interactions.

The LLM Knowledge Base is a collection of bite-sized explanations for commonly used terms and abbreviations related to Large Language Models and Generative AI.

It's an educational resource that helps you stay up-to-date with the latest developments in AI research and its applications.

Promptmetheus © 2023-present