Skip to content

perf: add adaptive reasoning effort and token-cost optimizations#166

Open
andrelncampos wants to merge 1 commit into
lessweb:mainfrom
andrelncampos:fix/dynamic-reasoning-effort
Open

perf: add adaptive reasoning effort and token-cost optimizations#166
andrelncampos wants to merge 1 commit into
lessweb:mainfrom
andrelncampos:fix/dynamic-reasoning-effort

Conversation

@andrelncampos

Copy link
Copy Markdown

This PR reduces token cost during long agent sessions by adding adaptive reasoning effort and related token-usage optimizations.

Main changes:

  • Adds RuntimeReasoningEffortManager to dynamically switch reasoning_effort between high and max.
  • Escalates to max after repeated tool failures or repeated identical tool-call loops.
  • Downgrades back to high after stable clean turns.
  • Adds cooldowns and anti-flapping behavior to avoid oscillation.
  • Integrates runtime effort changes into the session loop.
  • Reuses cached tool definitions during the loop.
  • Caches the system prompt per model.
  • Uses estimated context size for active token tracking instead of relying only on response usage.
  • Adds tests for escalation, downgrade, reset, cooldown and anti-flapping behavior.

Why:
Using max reasoning effort for every turn is expensive. This change keeps the default effort lower and escalates only when runtime signals indicate that the model needs more reasoning depth.

Validation:

  • npm run format
  • npm run build
  • npm run check
  • npm test

Result:

  • 421 passing tests
  • 0 failing tests
  • 8 skipped tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant