fix(card): adaptive stall_finalize threshold + false-positive recovery by Time4Mind · Pull Request #137 · Time4Mind/ccbot

Time4Mind · 2026-06-17T07:13:48Z

Summary

Patch 1 (adaptive threshold): split STALL_FINALIZE_AFTER_SECONDS by tail event type. tool_use → 300 s, text / thinking → 90 s. Catches the metrics-debug class of false positives where Claude is reasoning toward a final answer after a slow tool.
Patch 2 (recovery): maybe_finalize_stalled now arms CardState.stall_finalized=True. If a genuine assistant turn lands after, update_session_card / finalize_task wipe the binding via _recover_from_false_stall and _send_card spawns a fresh card below the stalled stub (…continued header marker).

The stalled stub stays in chat history — we don't rewrite it. The recovery card appears below with the real answer.

Context

On 2026-06-17 a long metrics debug session emitted its final BT_fin/CDM race answer to the tmux pane and the JSONL, but stall_finalize had already fired with idle=96 s tail=tool_use. Subsequent switcher taps painted empty stubs (len=33 edits to msg=7179) and the user never saw the answer in TG. The 90 s blanket threshold was the root cause; the silent edit of the finalized card was the user-visible part.

Test plan

Updated tests/test_stalled_card.py: tool_use tail now uses the longer threshold; added test_tool_use_within_extended_threshold_no_fire for the metrics-debug regression and test_stall_arms_recovery_flag for the flag.
New tests/ccbot/handlers/test_stall_recovery.py: _recover_from_false_stall helper, update_session_card and finalize_task paths after a stalled stub.
Updated tests/e2e/test_stalled_finalize.py: per-tail-type threshold in the seeded card.
uv run ruff check — clean.
uv run pyright src/ccbot/handlers/ — 0 errors.
uv run pytest — 731 passing (full suite) + 14 e2e green.

🤖 Generated with Claude Code

Two regressions on a long-reasoning metrics-debug session: (1) ``STALL_FINALIZE_AFTER_SECONDS=90s`` fired on ``tail=tool_use idle=96s`` while Claude was legitimately thinking toward the final answer; (2) once the STALL_NOTE was appended, the real answer landing later got silently edited into the now-finalized card the user had scrolled past. Fix (1): split the threshold by tail event type. ``tool_use`` tails get ``STALL_FINALIZE_TOOL_USE_SECONDS=300s`` because slow tools and post-tool reasoning are routinely silent for minutes; ``text`` / ``thinking`` tails keep the original 90s because mid-emit silence is genuinely suspicious. Fix (2): ``maybe_finalize_stalled`` now arms ``CardState .stall_finalized=True`` after the STALL_NOTE lands. The next call to ``update_session_card`` or ``finalize_task`` runs ``_recover_from_false_stall`` — wipes msg_id / events / pagination and flips ``is_continuation=True`` so ``_send_card`` spawns a fresh card below the stub instead of clobbering it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Time4Mind merged commit 5be2eaa into main Jun 17, 2026
4 checks passed

Time4Mind deleted the fix/stall-finalize-adaptive branch June 17, 2026 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(card): adaptive stall_finalize threshold + false-positive recovery#137

fix(card): adaptive stall_finalize threshold + false-positive recovery#137
Time4Mind merged 1 commit into
mainfrom
fix/stall-finalize-adaptive

Time4Mind commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Time4Mind commented Jun 17, 2026

Summary

Context

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant