Skip to content

fix(card): adaptive stall_finalize threshold + false-positive recovery#137

Merged
Time4Mind merged 1 commit into
mainfrom
fix/stall-finalize-adaptive
Jun 17, 2026
Merged

fix(card): adaptive stall_finalize threshold + false-positive recovery#137
Time4Mind merged 1 commit into
mainfrom
fix/stall-finalize-adaptive

Conversation

@Time4Mind

Copy link
Copy Markdown
Owner

Summary

  • Patch 1 (adaptive threshold): split STALL_FINALIZE_AFTER_SECONDS by tail event type. tool_use → 300 s, text / thinking → 90 s. Catches the metrics-debug class of false positives where Claude is reasoning toward a final answer after a slow tool.
  • Patch 2 (recovery): maybe_finalize_stalled now arms CardState.stall_finalized=True. If a genuine assistant turn lands after, update_session_card / finalize_task wipe the binding via _recover_from_false_stall and _send_card spawns a fresh card below the stalled stub (…continued header marker).

The stalled stub stays in chat history — we don't rewrite it. The recovery card appears below with the real answer.

Context

On 2026-06-17 a long metrics debug session emitted its final BT_fin/CDM race answer to the tmux pane and the JSONL, but stall_finalize had already fired with idle=96 s tail=tool_use. Subsequent switcher taps painted empty stubs (len=33 edits to msg=7179) and the user never saw the answer in TG. The 90 s blanket threshold was the root cause; the silent edit of the finalized card was the user-visible part.

Test plan

  • Updated tests/test_stalled_card.py: tool_use tail now uses the longer threshold; added test_tool_use_within_extended_threshold_no_fire for the metrics-debug regression and test_stall_arms_recovery_flag for the flag.
  • New tests/ccbot/handlers/test_stall_recovery.py: _recover_from_false_stall helper, update_session_card and finalize_task paths after a stalled stub.
  • Updated tests/e2e/test_stalled_finalize.py: per-tail-type threshold in the seeded card.
  • uv run ruff check — clean.
  • uv run pyright src/ccbot/handlers/ — 0 errors.
  • uv run pytest — 731 passing (full suite) + 14 e2e green.

🤖 Generated with Claude Code

Two regressions on a long-reasoning metrics-debug session: (1)
``STALL_FINALIZE_AFTER_SECONDS=90s`` fired on ``tail=tool_use idle=96s``
while Claude was legitimately thinking toward the final answer; (2)
once the STALL_NOTE was appended, the real answer landing later got
silently edited into the now-finalized card the user had scrolled
past.

Fix (1): split the threshold by tail event type. ``tool_use`` tails
get ``STALL_FINALIZE_TOOL_USE_SECONDS=300s`` because slow tools and
post-tool reasoning are routinely silent for minutes; ``text`` /
``thinking`` tails keep the original 90s because mid-emit silence is
genuinely suspicious.

Fix (2): ``maybe_finalize_stalled`` now arms ``CardState
.stall_finalized=True`` after the STALL_NOTE lands. The next call to
``update_session_card`` or ``finalize_task`` runs
``_recover_from_false_stall`` — wipes msg_id / events / pagination
and flips ``is_continuation=True`` so ``_send_card`` spawns a fresh
card below the stub instead of clobbering it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Time4Mind Time4Mind merged commit 5be2eaa into main Jun 17, 2026
4 checks passed
@Time4Mind Time4Mind deleted the fix/stall-finalize-adaptive branch June 17, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant