Skip to content

Fix Claude effort tiers to use CLI catalog ids and bridge alias lookups#313

Merged
plusplusoneplusplus merged 2 commits into
mainfrom
pr/e853b5d41-fix-claude-effort-tiers-to-use-cli-catal
Jun 10, 2026
Merged

Fix Claude effort tiers to use CLI catalog ids and bridge alias lookups#313
plusplusoneplusplus merged 2 commits into
mainfrom
pr/e853b5d41-fix-claude-effort-tiers-to-use-cli-catal

Conversation

@plusplusoneplusplus

Copy link
Copy Markdown
Owner

What changed

The Claude agent provider's high effort tier failed every chat turn with:

Unsupported reasoning effort "xhigh" requested for model "unknown". Supported efforts: unknown

Root cause: the hardcoded Claude effort-tier defaults referenced model ids (claude-opus-4-7, claude-sonnet-4.6, claude-haiku-4.5) that the Claude CLI catalog never advertises — the CLI's initialize response lists short aliases (default, opus, haiku) with per-model supportedEffortLevels. The executor's exact-id metadata lookup therefore never resolved supported efforts, and provider-default sentinel models were stripped to undefined before validation, producing the model "unknown" variant. The very-low tier also pinned low on Haiku, which advertises no effort levels at all.

  • Point CLAUDE_DEFAULTS at CLI catalog aliases (haiku/sonnet/opus) and drop the pinned effort for Haiku
  • Add findClaudeCatalogModel (new claude-model-catalog.ts in coc-agent-sdk): exact, provider-default-sentinel, dotted→dashed, and family (id/name/description) matching, so legacy stored ids keep working
  • Map the CLI model description into IModelInfo so sonnet can match the default alias whose description names the family
  • Resolve provider metadata in chat-base-executor via the matcher for Claude, including provider-default turns with no model id — an unsupported effort now fails with the actual supported list instead of unknown
  • Advertise conservative supportedReasoningEfforts on the curated Claude fallback model list so validation still resolves when CLI discovery fails
  • Keep unknown tier model ids and stored efforts visible in the admin effort-tier editor instead of rendering blank selects

Notes for reviewers

  • The live CLI catalog was verified directly via the same control_request initialize protocol listModels() uses: default = Sonnet 4.6 (low/medium/high/max), opus = Opus 4.8 (low/medium/high/xhigh/max), haiku = no effort support
  • Family matching is intentionally permissive: the Claude SDK silently downgrades an unsupported effort, so over-matching is harmless while a failed lookup hard-fails the turn
  • Explicitly saved tier configs that still carry old ids resolve via family matching; re-saving from the admin UI picks up the catalog aliases
  • Tests: new matcher unit tests, tier-default updates, executor regressions (alias catalog, legacy dashed id + xhigh, provider-default + effort, actionable rejection), admin editor coverage; coc-agent-sdk (595), forge model-reasoning, and impacted coc suites all pass; coc-agent-sdk/forge/coc (incl. SPA client) build clean

🤖 Generated with Claude Code

The Claude provider's effort-tier defaults referenced model ids
(claude-opus-4-7, claude-sonnet-4.6, claude-haiku-4.5) that the Claude
CLI catalog (default/opus/haiku aliases) never advertises, so executor
reasoning-effort validation could not resolve supported efforts and the
high tier failed with 'Unsupported reasoning effort "xhigh" requested
for model "unknown". Supported efforts: unknown'.

- Point CLAUDE_DEFAULTS at CLI catalog aliases (haiku/sonnet/opus) and
  drop the pinned effort for Haiku, which advertises no effort levels
- Add findClaudeCatalogModel: exact, provider-default-sentinel,
  dotted-to-dashed, and family (id/name/description) catalog matching
- Map the CLI model description into IModelInfo so 'sonnet' can match
  the 'default' alias whose description names the family
- Resolve provider metadata in chat-base-executor via the matcher for
  Claude, including provider-default turns with no model id
- Advertise conservative supportedReasoningEfforts on the curated
  Claude fallback model list so offline validation still resolves
- Keep unknown tier model ids and stored efforts visible in the admin
  effort-tier editor instead of rendering blank selects

Tests: matcher unit tests, tier default updates, executor regression
tests for the alias catalog, legacy dashed ids, provider-default turns,
and the actionable unsupported-effort error; admin editor coverage for
unknown ids.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…turns

An SSE subscriber attaching to a running process triggers
store.requestFlush(), which invokes the executor's registered
flushConversationTurn. When the turn completed concurrently, the flush's
upsertStreamingTurn could land after appendConversationTurn(
filterStreaming) had already persisted the final assistant turn — the
upsert found no streaming row and re-inserted the buffered content as a
permanent duplicate streaming turn. The UI then rendered five timestamped
bubbles for a four-turn conversation (flaky e2e
queue-conversation-mock.spec.ts 'full conversation' under CI load).

- Track turnFinalized per process session; streaming flushes no-op once
  the turn's final assistant turn is persisted
- Serialize streaming flushes and final appends through a per-process
  turnWriteChain so they cannot interleave on async stores
- Route the five filterStreaming append sites (chat, follow-up success/
  error, lifecycle-runner success/error) through the new
  appendFinalConversationTurn helper
- Flush still snapshots buffer/timeline synchronously at call time so
  throttled flushes persist progressively growing content; a flush after
  session cleanup stays a no-op

Tests: base-executor-flush-race.test.ts covers late flush after final
append, flush-then-append replacement, concurrent ordering both ways,
active-turn flushing, post-cleanup no-op, and multi-turn reset; executor,
queue-bridge, SSE-replay suites and the e2e mock spec (3x) pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@plusplusoneplusplus

Copy link
Copy Markdown
Owner Author

Added a second commit fixing the e2e (3) failure from the first run — it was not specific to this PR's changes but is a real server-side race, not test flakiness:

Root cause: when an SSE subscriber attaches to a running process, sse-handler calls store.requestFlush(). If the turn completes concurrently, the flush's upsertStreamingTurn can land after the executor's final appendConversationTurn(filterStreaming), find no streaming row, and re-insert the streamed content as a permanent duplicate streaming turn — which is why queue-conversation-mock.spec.ts saw 5 timestamped bubbles for a 4-turn conversation, stably across retries under CI load.

Fix: BaseExecutor now tracks turnFinalized per session and serializes streaming flushes + final appends through a per-process write chain (appendFinalConversationTurn). Late flushes no-op; flush-then-append still replaces the streaming turn via filterStreaming. Regression tests in base-executor-flush-race.test.ts cover both orderings, concurrent execution, post-cleanup flushes, and multi-turn reset.

🤖 Generated with Claude Code

@plusplusoneplusplus plusplusoneplusplus merged commit 8a173d3 into main Jun 10, 2026
36 checks passed
@plusplusoneplusplus plusplusoneplusplus deleted the pr/e853b5d41-fix-claude-effort-tiers-to-use-cli-catal branch June 10, 2026 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant