Add cross-invocation budget manager (issue #44)#70
Merged
Conversation
Adds an optional BudgetManager that tracks cumulative token usage across multiple Kernel.invoke() calls within a session. When attached via the new Kernel(budget_manager=...) keyword argument, the kernel reserves a budget slice before driver execution and reconciles the actual frame payload size afterwards. As the remaining budget shrinks the requested response_mode auto-escalates to a more aggressive tier (>50% remaining keeps the caller's mode; 20-50% downgrades raw to table; 5-20% floors at summary; <5% forces handle_only). BudgetExhausted is raised before the driver runs once the budget is spent. The manager is optional and off by default — kernels constructed without one behave identically to today. DryRunResult now reports the live budget_remaining and the escalated response_mode so callers can preview their next invocation. The new TokenCounter protocol lets callers plug in tiktoken or any other counter; the default is a chars/4 JSON-based approximation with no extra dependencies. A new optional [tiktoken] extra is reserved for the tiktoken-based counter. Honours the existing weaver-spec invariants: every invocation still flows through the firewall (I-01) and produces an ActionTrace (I-02); the admin-only raw gate is preserved and applied before escalation.
There was a problem hiding this comment.
Pull request overview
Adds a cross-invocation context budget feature to agent-kernel by introducing a session-level BudgetManager that tracks cumulative token usage across multiple Kernel.invoke() calls and escalates response_mode as remaining budget shrinks.
Changes:
- Introduces
BudgetManager,TokenCounter, anddefault_token_counterfor cumulative token budgeting and pluggable token counting. - Integrates budget reservation/escalation/reconciliation into
Kernel.invoke()and exposesKernel.budget. - Updates public exports, docs, changelog, and adds tests for budgeting behavior and kernel integration.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_kernel.py | Adds kernel-level integration tests for cross-invocation budgeting, escalation, dry-run behavior, and reservation release on driver failure. |
| tests/test_firewall.py | Adds unit tests for token counting and BudgetManager allocation/recording/escalation behavior. |
| src/agent_kernel/kernel.py | Wires BudgetManager into invoke() (reserve before drivers, escalate mode, record usage after firewall) and adds Kernel.budget. |
| src/agent_kernel/firewall/budgets.py | Expands budgets module with TokenCounter, default_token_counter, and the new BudgetManager. |
| src/agent_kernel/firewall/init.py | Re-exports budget manager and token counting APIs. |
| src/agent_kernel/errors.py | Adds BudgetExhausted error type. |
| src/agent_kernel/init.py | Exports BudgetManager, BudgetExhausted, and token counting APIs at top level. |
| pyproject.toml | Adds optional [tiktoken] extra. |
| docs/context_firewall.md | Documents cross-invocation budgeting and the escalation table. |
| CHANGELOG.md | Adds an entry describing the new budgeting feature and exports. |
Five fixes from the Copilot review:
1. Bare ValueError on BudgetManager validation violated AGENTS.md
("never raise bare ValueError to callers"). Replaced with a new
BudgetConfigError(AgentKernelError); updated tests.
2. firewall/budgets.py exceeded the ≤300 line guideline. Split into:
- budgets.py (28 lines, original Budgets dataclass only)
- token_counting.py (41 lines, TokenCounter + default_token_counter)
- budget_manager.py (275 lines, BudgetManager + helpers)
Public imports unchanged; everything re-exported via firewall/__init__.
3. invoke() did not mirror the Firewall's admin-only raw gate. A
non-admin requesting raw kept effective_mode == "raw", which made
the kernel skip handle creation even though the Firewall would then
downgrade to summary — yielding a summary frame without a handle.
The kernel now applies the same raw → summary downgrade before the
budget escalation and handle-creation decision. Added a regression
test covering the case.
4. A Firewall exception after a budget reservation permanently leaked
the reserved tokens. Wrapped the firewall transform + reconciliation
in try/finally that releases the reservation if record_usage never
ran. Added a regression test using a stub failing Firewall.
5. Updated firewall tests to assert BudgetConfigError instead of
ValueError, and verified BudgetConfigError is a subclass of
AgentKernelError.
make ci: lint clean, mypy clean, 403 tests pass (was 400; +3 from
this change), all examples run.
… test - BudgetExhausted and BudgetConfigError docstrings referenced firewall.budgets.BudgetManager (pre-split path); now point to firewall.budget_manager.BudgetManager - Add test_suggested_mode_boundary_exactly_five_percent_is_summary_not_handle_only to confirm 5% exactly lands in the summary bucket, not handle_only; guards against a future < → <= comparator change
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #44.
LLM agents running long sessions gradually fill their context window without any visibility into how much budget remains across invocations. The kernel had no mechanism to track cumulative token usage or reduce response verbosity as context filled up — callers had to either guess, cap arbitrarily, or switch to
summarymode globally.This PR adds
BudgetManager, an optional session-level budget tracker that monitors cumulative token usage acrossKernel.invoke()calls, automatically escalates theresponse_modeas the remaining budget shrinks, and raisesBudgetExhaustedbefore the driver runs when the budget is gone.What changed
src/agent_kernel/firewall/budget_manager.pyBudgetManager:allocate()/record_usage()/release(),suggested_mode(),remaining/fraction_remainingproperties,asyncio.Lock-protected_BudgetState.src/agent_kernel/firewall/token_counting.pyTokenCounterProtocol +default_token_counter(len(json.dumps(v)) // 4, no extra deps).src/agent_kernel/firewall/budgets.pyBudgetsdataclass only; cross-invocation tracking moved tobudget_manager.py.src/agent_kernel/kernel.pyKernel.__init__accepts optionalbudget_manager;invoke()allocates budget, escalates mode, records usage intry/finally;dry_run=Truereportsbudget_remaining; newKernel.budgetproperty.src/agent_kernel/errors.pyBudgetExhausted(raised before driver runs) andBudgetConfigError(raised on invalid construction or negative amounts).src/agent_kernel/__init__.pyBudgetManager,BudgetExhausted,BudgetConfigError,TokenCounter,default_token_counter.src/agent_kernel/firewall/__init__.pyBudgetManager,TokenCounter,default_token_counterto firewall exports and__all__.pyproject.tomltiktokenextra (tiktoken>=0.6) for more accurate token counting.docs/context_firewall.mdtests/test_firewall.pytests/test_kernel.pyCHANGELOG.md[Unreleased].Design decisions
suggested_mode(requested)takes the caller's requested mode as an argument, rather than the parameterlesssuggested_mode()in issue Cross-invocation context budget manager #44. This preserves stricter caller-requested modes: a caller that already requestedhandle_onlywill never have their mode relaxed tosummaryby the escalation table.budget_manager.py,token_counting.py,budgets.py). The originalbudgets.pyreached ~340 lines when all three concerns were merged; splitting keeps each under the 300-line limit inAGENTS.mdand keeps the per-invocation cap (Budgets) separate from the cross-invocation tracker (BudgetManager).allocate()reserves a budget slice before the driver runs;try/finallyinkernel.invoke()callsrelease()on driver or firewall failure, so the session is not permanently penalized for errors.record_usage()reconciles the reservation with the actual payload size after the Frame is produced.default_token_counter(len(json.dumps(v)) // 4) needs no extra deps and is accurate to within ~10% for typical JSON payloads. Callers who need tiktoken precision can installweaver-kernel[tiktoken]and pass a customTokenCounter.BudgetManagerbehave identically to earlier versions.Scope
In scope (delivered):
BudgetManagerwith allocation, recording, release, and automatic mode escalationTokenCounterprotocol anddefault_token_countercharacter-based implementationtiktokenextra for more accurate countingbudget_managerparam,Kernel.budgetproperty,dry_runbudget reportingBudgetExhaustedandBudgetConfigErrorerror typesOut of scope (deferred):
Kernelconstruction)Testing
New tests are organized into:
Nonehandling, non-serializable fallback, character approximation ratio.total_budget/default_requestraiseBudgetConfigError.remaining/fraction_remainingafter each operation._stricter()never relaxes a stricter mode.Risks
json.dumpsis O(n) per invocation. Acceptable — the Frame is already serialized at this point; measuring its character length adds negligible overhead.asyncio.Lockis not reentrant. A coroutine that calledallocate()twice before the firstfinallyresolved would deadlock. The kernel never recurses during a singleinvoke(), so this cannot trigger in practice.Checklist
make cipasses locally (fmt → lint → mypy strict → pytest → examples)capability,principal,grant,FramethroughoutBudgetManagerare unaffectedValueError/KeyErrorto callers —BudgetExhaustedandBudgetConfigErrorused throughoutCHANGELOG.mdupdated under[Unreleased]docs/context_firewall.md) in the same PR