Reorg leaves gap in accepted chain as new blocks are not downloaded

When a node announces a reorg via `headers`, the agent removes the stale block's `node_block` row but never rolls back `SyncState` for the reorged height, so the replacement block(s) is/are never downloaded.

The result is a hole in the accepted chain: the orphan block remains in `block` with no `node_block` row, the canonical block at that height is entirely absent from the database, and the child block is saved and accepted normally. The hole persists for as long as the agent process stays up, and silently corrupts spent/unspent queries: outputs spent by transactions in the missing canonical block appear unspent.

A restart of the agent fixes this because it pulls all block heights from the DB and syncs from the first height not found.

## Example

Environment: production instance, mainnet, single trusted node, BCHN 29.0.0

Two damaged heights observed:

**Height 953896** (reorg on 2026-06-04)

- In DB, orphan with `accepted_by: []`: `000000000000000001e0c1c70eff0bee1c958f4e9499166dceca3357dedcb045` (34 txs)
- Canonical block, absent from DB: `0000000000000000004b32671f255b4cc6e0a1bead73616aded4e2d6d303391d` (37 txs)

**Height 951556** (reorg on 2026-05-18)

- In DB, orphan with `accepted_by: []`: `00000000000000000127178b4dc9a30742aa3f505373bf24b2812fd877b6a6c7` (1 tx)
- Canonical block, absent from DB: `000000000000000000e70f428e1b05869697093a26d951a8202d1258cd9e42fb` (3 txs)

In both cases the neighboring heights (e.g. 953895 and 953897) are saved and accepted by the node, so the accepted chain has a one-block gap — a state that should be impossible.

For contrast, height 936520 (reorg on 2026-02-02) shows the repaired outcome: both the orphan (`…ed411c59`, unaccepted) and the canonical block (`…15d4a0d0`, accepted) are present. Notably the canonical 936520's `node_block.accepted_at` is `NULL` — the signature of a block saved more than 2 hours after its timestamp (`saveBlock`, `src/agent.ts` ~1735) — while its neighbor 936521 was accepted live 37 seconds after mining. So the hole formed at 936520 too and was only backfilled by a later agent restart.

Downstream impact: a CashToken category we track had 66 UTXOs reported unspent by Chaingraph that Fulcrum/Electrum report as spent. All 66 are spent by transactions whose only `block_inclusions` row points at the unaccepted orphan at 953896. One of those spenders (`68ea268e6ebc61440b56b3bfdacd79cc161fbe5e2612cff497f29d56953a9344`) is confirmed on the real chain in exactly the missing canonical block `…d303391d`. The transactions were re-mined in the replacement block but Chaingraph never indexed it.

## Root cause

1. **`handleStaleBlocks` does not roll back SyncState** (`src/agent.ts` ~1839–1854). It calls `removeStaleBlocksForNode` (deletes the orphan's `node_block` row, `src/db.ts` ~988–1001) but never calls `syncState.blockReorganizationAtHeight(firstHeight)` — which exists for exactly this purpose (`src/components/sync-state.ts` ~122–137). SyncState therefore still reports the reorged height as synced, so `selectNextBlockToDownload` never selects the canonical replacement for download. The child block syncs normally, and `fullySyncedUpToHeight` advances past the hole.

2. **Silent failure modes** hide the problem. `acceptBlocksViaHeaders` (`src/db.ts` ~935–972) joins incoming hashes against `block` and silently inserts zero `node_block` rows for any hash with no `block` row — no error, no warning. And the in-memory `blockDb` set is populated once at startup (`src/agent.ts` ~430–432) and never updated at runtime, so `catchUpViaHeaders` (`src/agent.ts` ~1367–1430) reasons from stale knowledge of what is saved.

## Sequence

1. Node announces a one-block reorg at height N via `headers`; `BlockTree.updateHeaders` splices in the canonical hash and fires `onStaleBlocks` (`src/components/block-tree.ts` ~208–218).
2. `handleStaleBlocks` deletes the orphan's `node_block` row. SyncState is untouched and still considers height N synced.
3. No download of canonical N is ever scheduled (`selectNextBlockToDownload` skips "synced" heights). Block N+1 arrives, is saved, and is accepted.
4. The database is now: orphan at N (unaccepted), no canonical N, accepted N+1. Permanent until restart.

The sequence above describes a one-block reorg (the case observed in production), but the mechanism generalizes:

- **Deeper reorgs:** for a depth-d reorg replacing already-synced heights N…N+d−1, `onStaleBlocks` fires with the full stale chain, all d `node_block` rows are deleted, SyncState still reports all d heights as synced, and `selectNextBlockToDownload` schedules only the net-new heights above the old tip — leaving a d-block hole. Reorgs deeper than 8 blocks arrive via `inv` rather than `headers`, but that handler (`src/agent.ts` ~656–665) just calls `requestHeaders`, funneling into the same `updateHeaders` → `onStaleBlocks` path.
- **Flip-flop reorgs** (chain reorgs away from a block and later back to it) produce a variant hole: the canonical block row survives from its first acceptance, but its `node_block` row was deleted on the reorg-away and nothing re-inserts it — `catchUpViaHeaders` never reaches that height because SyncState reports it synced. Result: block present but unaccepted.
- The only case with no hole is a reorg at heights the agent had **not yet synced** (still catching up below the fork point) — SyncState is below the fork, so the canonical blocks download in normal course. This is why the bug specifically bites in steady-state operation at tip.

## Why a restart repairs it (workaround — verified)

On restart, `registerTrustedNodeWithDb` (`src/db.ts` ~460–504) rebuilds `syncedHeaderHashChain` from `block` ⋈ `node_block`; the damaged height yields no row, producing a `null` in the chain (`blockArrayToHashChain`, `src/db.ts` ~62–80). `restoreChainForNode` then sets `fullySyncedUpToHeight = N−1`, and after header sync `fillBlockBuffer`/`selectNextBlockToDownload` schedules the canonical block at N for download; it is saved with `accepted_at = NULL` (timestamp older than 2 hours, `src/agent.ts` ~1735). We verified this end-to-end: restarting the damaged instance repaired both holes within the session, and the backfilled `node_block` rows carry the `NULL` signature. Note `repairIncompleteBlocks` does **not** catch this case — it only repairs blocks that exist with mismatched transaction counts.

## Suggested fix

In `handleStaleBlocks`, roll back sync state before/alongside removing stale blocks, and re-trigger the buffer fill:

```ts
handleStaleBlocks(staleChain: string[], firstHeight: number, nodeName: string) {
  this.nodes[nodeName]?.syncState?.blockReorganizationAtHeight(firstHeight);
  removeStaleBlocksForNode(this.nodes[nodeName]!.internalId!, staleChain)
    .then(() => {
      this.logger.info(staleChain, `${nodeName}: re-organization at height ${firstHeight}`);
      this.scheduleBlockBufferFill();
    })
    .catch((err) => this.logger.error(err));
}
```

Hardening, secondarily:

- Add saved hashes to `this.blockDb` when `saveBlock` succeeds, so `catchUpViaHeaders` reasons from current state.
- Make `acceptBlocksViaHeaders` log a warning (or fail loudly) when fewer `node_block` rows are inserted than hashes supplied — today it silently drops hashes that have no `block` row.
- Optionally extend the periodic audit to detect accepted-chain holes (height H accepted at H−1 and H+1 but not H), which would self-heal instances already damaged.

Tests and line numbered based on commit 535e41b.

Co-athored with claude-fable-5.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorg leaves gap in accepted chain as new blocks are not downloaded #77

Example

Root cause

Sequence

Why a restart repairs it (workaround — verified)

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Reorg leaves gap in accepted chain as new blocks are not downloaded #77

Description

Example

Root cause

Sequence

Why a restart repairs it (workaround — verified)

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions