Skip to content

Add cross-invocation budget manager (issue #44)#70

Merged
dgenio merged 3 commits into
mainfrom
claude/triage-issues-WowaN
May 19, 2026
Merged

Add cross-invocation budget manager (issue #44)#70
dgenio merged 3 commits into
mainfrom
claude/triage-issues-WowaN

Conversation

@dgenio
Copy link
Copy Markdown
Owner

@dgenio dgenio commented May 15, 2026

Summary

Closes #44.

LLM agents running long sessions gradually fill their context window without any visibility into how much budget remains across invocations. The kernel had no mechanism to track cumulative token usage or reduce response verbosity as context filled up — callers had to either guess, cap arbitrarily, or switch to summary mode globally.

This PR adds BudgetManager, an optional session-level budget tracker that monitors cumulative token usage across Kernel.invoke() calls, automatically escalates the response_mode as the remaining budget shrinks, and raises BudgetExhausted before the driver runs when the budget is gone.

What changed

File Change
src/agent_kernel/firewall/budget_manager.py New (223 lines) — BudgetManager: allocate() / record_usage() / release(), suggested_mode(), remaining / fraction_remaining properties, asyncio.Lock-protected _BudgetState.
src/agent_kernel/firewall/token_counting.py New (42 lines) — TokenCounter Protocol + default_token_counter (len(json.dumps(v)) // 4, no extra deps).
src/agent_kernel/firewall/budgets.py Reduced to 27 lines — per-invocation Budgets dataclass only; cross-invocation tracking moved to budget_manager.py.
src/agent_kernel/kernel.py Kernel.__init__ accepts optional budget_manager; invoke() allocates budget, escalates mode, records usage in try/finally; dry_run=True reports budget_remaining; new Kernel.budget property.
src/agent_kernel/errors.py New BudgetExhausted (raised before driver runs) and BudgetConfigError (raised on invalid construction or negative amounts).
src/agent_kernel/__init__.py Re-exports BudgetManager, BudgetExhausted, BudgetConfigError, TokenCounter, default_token_counter.
src/agent_kernel/firewall/__init__.py Adds BudgetManager, TokenCounter, default_token_counter to firewall exports and __all__.
pyproject.toml Optional tiktoken extra (tiktoken>=0.6) for more accurate token counting.
docs/context_firewall.md New "Cross-invocation budgets" section with escalation table and tiktoken usage example.
tests/test_firewall.py 27 new tests: construction validation, allocation/recording/release, escalation table (including exact 5%, 20%, 50% boundaries), custom counter, properties.
tests/test_kernel.py 9 new tests: end-to-end budget tracking, mode escalation, exhaustion before driver runs, release on driver/firewall failure, dry-run budget reporting.
CHANGELOG.md Feature entries under [Unreleased].

Design decisions

  • suggested_mode(requested) takes the caller's requested mode as an argument, rather than the parameterless suggested_mode() in issue Cross-invocation context budget manager #44. This preserves stricter caller-requested modes: a caller that already requested handle_only will never have their mode relaxed to summary by the escalation table.
  • Three-file split for the firewall budget layer (budget_manager.py, token_counting.py, budgets.py). The original budgets.py reached ~340 lines when all three concerns were merged; splitting keeps each under the 300-line limit in AGENTS.md and keeps the per-invocation cap (Budgets) separate from the cross-invocation tracker (BudgetManager).
  • Reservation pattern: allocate() reserves a budget slice before the driver runs; try/finally in kernel.invoke() calls release() on driver or firewall failure, so the session is not permanently penalized for errors. record_usage() reconciles the reservation with the actual payload size after the Frame is produced.
  • Only LLM-facing payload counts toward usage (facts, table rows, raw data). Kernel bookkeeping (provenance, action IDs, handle IDs) is excluded to avoid inflating the counter with internal overhead.
  • Character-based default_token_counter (len(json.dumps(v)) // 4) needs no extra deps and is accurate to within ~10% for typical JSON payloads. Callers who need tiktoken precision can install weaver-kernel[tiktoken] and pass a custom TokenCounter.
  • Backward compatible: kernels without a BudgetManager behave identically to earlier versions.

Scope

In scope (delivered):

  • BudgetManager with allocation, recording, release, and automatic mode escalation
  • TokenCounter protocol and default_token_counter character-based implementation
  • Optional tiktoken extra for more accurate counting
  • Kernel integration: budget_manager param, Kernel.budget property, dry_run budget reporting
  • BudgetExhausted and BudgetConfigError error types

Out of scope (deferred):

  • Persistent cross-session budget storage (in-memory only, resets at Kernel construction)
  • Per-driver token counting for internal bookkeeping fields
  • Budget visualization or management API

Testing

ruff format --check src/ tests/  →  already formatted
ruff check src/ tests/           →  All checks passed
mypy src/                        →  Success: no issues found in 27 source files
pytest -q tests/test_firewall.py tests/test_kernel.py  →  89 passed in 0.86s
CI matrix (3.10/3.11/3.12 + weaver-spec conformance stub)  →  4/4 pass

New tests are organized into:

  • Token counter — protocol contract, None handling, non-serializable fallback, character approximation ratio.
  • Construction validation — non-positive total_budget / default_request raise BudgetConfigError.
  • Allocation, recording, release — basic usage, negative-amount guards, remaining / fraction_remaining after each operation.
  • Escalation table — interior buckets and exact boundaries (5%, 20%, 50%); _stricter() never relaxes a stricter mode.
  • Kernel integration — end-to-end recording, mode escalation mid-session, exhaustion before driver runs, reservation release on driver and firewall failure, dry-run reporting.

Risks

  • json.dumps is O(n) per invocation. Acceptable — the Frame is already serialized at this point; measuring its character length adds negligible overhead.
  • asyncio.Lock is not reentrant. A coroutine that called allocate() twice before the first finally resolved would deadlock. The kernel never recurses during a single invoke(), so this cannot trigger in practice.

Checklist

  • make ci passes locally (fmt → lint → mypy strict → pytest → examples)
  • CI green on Python 3.10 / 3.11 / 3.12 + weaver-spec conformance stub
  • Docstrings match the final implementation
  • No dead code — all new parameters, helpers, and types exercised by tests
  • Naming consistent: capability, principal, grant, Frame throughout
  • Backward-compat: kernels without BudgetManager are unaffected
  • No bare ValueError / KeyError to callers — BudgetExhausted and BudgetConfigError used throughout
  • CHANGELOG.md updated under [Unreleased]
  • Updated canonical doc (docs/context_firewall.md) in the same PR

Adds an optional BudgetManager that tracks cumulative token usage across
multiple Kernel.invoke() calls within a session. When attached via the
new Kernel(budget_manager=...) keyword argument, the kernel reserves a
budget slice before driver execution and reconciles the actual frame
payload size afterwards. As the remaining budget shrinks the requested
response_mode auto-escalates to a more aggressive tier (>50% remaining
keeps the caller's mode; 20-50% downgrades raw to table; 5-20% floors at
summary; <5% forces handle_only). BudgetExhausted is raised before the
driver runs once the budget is spent.

The manager is optional and off by default — kernels constructed without
one behave identically to today. DryRunResult now reports the live
budget_remaining and the escalated response_mode so callers can preview
their next invocation. The new TokenCounter protocol lets callers plug
in tiktoken or any other counter; the default is a chars/4 JSON-based
approximation with no extra dependencies. A new optional [tiktoken]
extra is reserved for the tiktoken-based counter.

Honours the existing weaver-spec invariants: every invocation still
flows through the firewall (I-01) and produces an ActionTrace (I-02);
the admin-only raw gate is preserved and applied before escalation.
Copilot AI review requested due to automatic review settings May 15, 2026 12:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a cross-invocation context budget feature to agent-kernel by introducing a session-level BudgetManager that tracks cumulative token usage across multiple Kernel.invoke() calls and escalates response_mode as remaining budget shrinks.

Changes:

  • Introduces BudgetManager, TokenCounter, and default_token_counter for cumulative token budgeting and pluggable token counting.
  • Integrates budget reservation/escalation/reconciliation into Kernel.invoke() and exposes Kernel.budget.
  • Updates public exports, docs, changelog, and adds tests for budgeting behavior and kernel integration.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_kernel.py Adds kernel-level integration tests for cross-invocation budgeting, escalation, dry-run behavior, and reservation release on driver failure.
tests/test_firewall.py Adds unit tests for token counting and BudgetManager allocation/recording/escalation behavior.
src/agent_kernel/kernel.py Wires BudgetManager into invoke() (reserve before drivers, escalate mode, record usage after firewall) and adds Kernel.budget.
src/agent_kernel/firewall/budgets.py Expands budgets module with TokenCounter, default_token_counter, and the new BudgetManager.
src/agent_kernel/firewall/init.py Re-exports budget manager and token counting APIs.
src/agent_kernel/errors.py Adds BudgetExhausted error type.
src/agent_kernel/init.py Exports BudgetManager, BudgetExhausted, and token counting APIs at top level.
pyproject.toml Adds optional [tiktoken] extra.
docs/context_firewall.md Documents cross-invocation budgeting and the escalation table.
CHANGELOG.md Adds an entry describing the new budgeting feature and exports.

Comment thread src/agent_kernel/firewall/budgets.py Outdated
Comment thread src/agent_kernel/firewall/budgets.py Outdated
Comment thread src/agent_kernel/kernel.py Outdated
Comment thread src/agent_kernel/kernel.py Outdated
Comment thread tests/test_firewall.py
claude and others added 2 commits May 15, 2026 13:05
Five fixes from the Copilot review:

1. Bare ValueError on BudgetManager validation violated AGENTS.md
   ("never raise bare ValueError to callers"). Replaced with a new
   BudgetConfigError(AgentKernelError); updated tests.

2. firewall/budgets.py exceeded the ≤300 line guideline. Split into:
   - budgets.py (28 lines, original Budgets dataclass only)
   - token_counting.py (41 lines, TokenCounter + default_token_counter)
   - budget_manager.py (275 lines, BudgetManager + helpers)
   Public imports unchanged; everything re-exported via firewall/__init__.

3. invoke() did not mirror the Firewall's admin-only raw gate. A
   non-admin requesting raw kept effective_mode == "raw", which made
   the kernel skip handle creation even though the Firewall would then
   downgrade to summary — yielding a summary frame without a handle.
   The kernel now applies the same raw → summary downgrade before the
   budget escalation and handle-creation decision. Added a regression
   test covering the case.

4. A Firewall exception after a budget reservation permanently leaked
   the reserved tokens. Wrapped the firewall transform + reconciliation
   in try/finally that releases the reservation if record_usage never
   ran. Added a regression test using a stub failing Firewall.

5. Updated firewall tests to assert BudgetConfigError instead of
   ValueError, and verified BudgetConfigError is a subclass of
   AgentKernelError.

make ci: lint clean, mypy clean, 403 tests pass (was 400; +3 from
this change), all examples run.
… test

- BudgetExhausted and BudgetConfigError docstrings referenced
  firewall.budgets.BudgetManager (pre-split path); now point to
  firewall.budget_manager.BudgetManager
- Add test_suggested_mode_boundary_exactly_five_percent_is_summary_not_handle_only
  to confirm 5% exactly lands in the summary bucket, not handle_only;
  guards against a future < → <= comparator change
@dgenio dgenio merged commit b4fa1d2 into main May 19, 2026
4 checks passed
@dgenio dgenio deleted the claude/triage-issues-WowaN branch May 19, 2026 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cross-invocation context budget manager

3 participants