Skip to content

Duplicate / stale renderer frames leave the REPL visually corrupted until manual /repaint #26

Description

@xjdr-noumena

Symptom

REPL state changes occasionally leave stale or duplicate frames on screen — visible as duplicated lines, ghost transcript rows, scrolled-but-not-cleared regions, or partial overlap between the alt-screen and main-screen outputs. The user-visible recovery is to run /repaint (alias /redraw), which forces forceRedraw({ clearBeforePaint: true }) and writes a renderer diagnostic artifact under $NCODE_CONFIG_DIR/debug/repaint-*. This has been an ongoing, multi-month thorn.

Existing mitigation surface (evidence the issue is real but not eliminated)

  • /repaint command (src/commands/repaint/index.ts, isHidden: true) is a user-facing recovery entrypoint rather than a fix, and writes a bounded diagnostic artifact via src/commands/repaint/repaint.ts.
  • That diagnostic carries a corruption taxonomy (src/commands/repaint/repaint.ts:50-55) with six verdicts: clean, terminal_only, front_corrupt, logical_corrupt, backframe_corrupt, mixed_or_unknown. Four of the six are corruption categories — the renderer explicitly knows it can produce corrupted front/back/logical frames.
  • src/session/replTranscriptResetRedraw.ts exposes requestReplTranscriptResetRedraw() as the reset path, called only when the user runs /repaint (no automatic trigger when corruption is detected).
  • src/ink/ink.tsx:861 acknowledges a known double-render race: lodash debounce sees timeSinceLastCall >= wait → leadingEdge fires IMMEDIATELY → double render ~0.1ms apart → jank. The comment describes the race, not its elimination.
  • src/ink/ contains 38 regression test files dedicated to flicker / duplicate / frame-corruption behavior: replFlickerOracle, logUpdateFlickerRegression, inkRecoveryBehavior, inkCompactBaseline, altScreenResizePolicy, overlayInvalidation, layoutDamageRows, replVisibleScreenContract, replTranscriptScreenContract, replPtyTranscriptScreenContract, replToolResultMountedContract, replSubmitAssistantTurn, replTypingAfterReplyTrace, selectionDragAutoscrollFlicker, tmuxResetSequence, among others. Each test represents a previously-hit regression, not a closed class of bugs.

Expected outcome

The renderer produces exactly one canonical frame per logical state change, with no diff between observed-on-screen content and the computed logical frame, across:

  • resize events (alt-screen main-screen switch, terminal resize),
  • scroll/damage events in ScrollBox and similar containers,
  • transcript append, tool-result mounting, and assistant-turn submission,
  • pty startup, tmux reset sequences, and overlay transitions.

/repaint should become unnecessary as a recovery path. Diagnostic verdicts other than clean should be a hard test failure, not a user-run command output.

Out of scope (intentionally)

  • Specific model attribution — this is independent of inference transport.
  • Terminals without alt-screen support (graceful degradation, but not the primary failure surface).

Tracking

This is a long-standing issue. The presence of /repaint, the corruption taxonomy, the acknowledged double-render race, and the 38-file regression surface together demonstrate that the issue is real but the underlying pipeline has not been repaired end-to-end. This issue exists to track the durable fix, not another regression patch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions