Skip to content

Evidence-match check on reviewer citations (B7) #8

@azevedo

Description

@azevedo

Status

Declined entirely — YAGNI.

Context

ce-code-review's Stage 5b is a validator pass that re-checks each surviving finding's file:line citation against the actual source. The pattern serves two distinct purposes:

  1. Stale-finding check (ce's primary motivation). After autofix mode applies safe fixes, line numbers shift; findings cited at pre-fix lines become stale. Stage 5b catches this drift. Only runs in ce's non-interactive (externalization) modes.
  2. Anti-hallucination check (secondary). LLM reviewers sometimes cite line numbers that don't exist, describe code that doesn't match what's there, or reference functions/variables not in the diff. Independent of autofix.

For /ba:review, only (2) is relevant — there's no autofix mode applying changes mid-flow. The candidate's value is purely anti-hallucination.

Decision

Declined. User has not experienced hallucinated-citation pain in practice; building mitigation infrastructure before observing the problem is the textbook YAGNI failure.

Design captured (if revisited)

Two tiers were considered:

Tier 1 — Existence check

  • Verify cited file exists in the diff or repo
  • Verify cited line ≤ file length
  • Cheap (file list and line counts are already in orchestrator context, no extra reads)
  • Catches the most flagrant hallucinations (line numbers beyond file length, files not in the diff)

Tier 2 — Content match

  • Re-read file around line ± N (likely N = 5)
  • Verify reviewer's prose description plausibly matches the code at that location
  • Higher token cost per finding (~few hundred tokens of re-read per check)
  • Catches subtler hallucinations (described code differs from actual code at cited location)

Other decisions if revisited

Decision Choice
Run order After B6 dedup. Operate on consolidated findings, not per-reviewer outputs.
On failure Demote confidence by one anchor + add *(evidence-match warning)* marker. Do NOT drop.
Always-on or mode-gated Tier 1 always-on (cost negligible). Tier 2 opt-in / mode-gated.
Window for Tier 2 line ± 5 initial guess; tunable on observed false-positive rate.

Partial absorption elsewhere

The file-existence check from Tier 1 is included in C2's validator pass (see the consolidation rework bundle issue) as part of schema validity. So a minimal form of "evidence match" effectively ships with the consolidation rework — but framed as structural schema validation, not as a separate anti-hallucination concept. If you want strict honoring of this decline, that file-existence check could be removed from C2's validator scope; the design decision currently includes it.

Trigger conditions for revisit

  • User notices hallucinated citations in /ba:review output regularly (e.g., multiple findings per review pointing at lines that don't exist or describing code that's not there)
  • An autofix mode is introduced (would re-motivate ce's original use case — line drift after auto-applied fixes)
  • C2 (structured output) lands and we want to push more validation into the validator pass

References

  • ce-code-review Stage 5b (evidence-match in externalization modes)
  • C2 issue — already includes file-existence as part of schema validation

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions