Context Budget
Deliberate allocation of a model's finite context window across different types of context, framing context engineering as an optimization problem with hard token constraints.
Also known as: Token Budget
Category: AI
Tags: ai, context-engineering, context-management, ai-context-patterns, optimization
Explanation
A context budget is the deliberate allocation of a model's finite context window across the different types of context it needs: instructions, knowledge, memory, tools, tool results, and the user query. It frames context engineering as an optimization problem with a hard constraint: everything must fit within the maximum token limit.
## Why Budgets Matter
The context window is finite. Everything competes for the same token space. Without a budget, context grows until it either hits the window limit or degrades output quality through context distraction and context entropy. The key insight: past a certain point, more context actively hurts. A budget forces you to decide what earns its place.
## Budget Allocation
A practical budget divides the context window across components:
- **Instructions**: rules, behavior, constraints -- stable but grows with scope
- **Knowledge**: domain facts, documentation -- largest consumer, needs pruning
- **Memory**: past interactions, learned patterns -- grows unboundedly without limits
- **Tools**: tool definitions, MCP schemas -- fixed per tool set
- **Tool results**: dynamic data from tool calls -- unpredictable, can spike
- **Query**: the actual user request -- usually small
- **Conversation**: prior turns in the exchange -- grows linearly with conversation length
## Budget Strategies
- **Progressive disclosure**: load only what is needed now; defer the rest
- **Lazy loading**: let the agent pull context on demand rather than front-loading everything
- **Compression**: summarize older context to free space for fresh information
- **Tiered priority**: define what gets cut first when the budget is tight (usually conversation history, then tool results, then knowledge)
- **Hard caps per component**: set maximum token allocations per context type to prevent any single component from starving the others
Related Concepts
← Back to all concepts