When a node announces a reorg via headers, the agent removes the stale block's node_block row but never rolls back SyncState for the reorged height, so the replacement block(s) is/are never downloaded.
The result is a hole in the accepted chain: the orphan block remains in block with no node_block row, the canonical block at that height is entirely absent from the database, and the child block is saved and accepted normally. The hole persists for as long as the agent process stays up, and silently corrupts spent/unspent queries: outputs spent by transactions in the missing canonical block appear unspent.
A restart of the agent fixes this because it pulls all block heights from the DB and syncs from the first height not found.
Example
Environment: production instance, mainnet, single trusted node, BCHN 29.0.0
Two damaged heights observed:
Height 953896 (reorg on 2026-06-04)
- In DB, orphan with
accepted_by: []: 000000000000000001e0c1c70eff0bee1c958f4e9499166dceca3357dedcb045 (34 txs)
- Canonical block, absent from DB:
0000000000000000004b32671f255b4cc6e0a1bead73616aded4e2d6d303391d (37 txs)
Height 951556 (reorg on 2026-05-18)
- In DB, orphan with
accepted_by: []: 00000000000000000127178b4dc9a30742aa3f505373bf24b2812fd877b6a6c7 (1 tx)
- Canonical block, absent from DB:
000000000000000000e70f428e1b05869697093a26d951a8202d1258cd9e42fb (3 txs)
In both cases the neighboring heights (e.g. 953895 and 953897) are saved and accepted by the node, so the accepted chain has a one-block gap — a state that should be impossible.
For contrast, height 936520 (reorg on 2026-02-02) shows the repaired outcome: both the orphan (…ed411c59, unaccepted) and the canonical block (…15d4a0d0, accepted) are present. Notably the canonical 936520's node_block.accepted_at is NULL — the signature of a block saved more than 2 hours after its timestamp (saveBlock, src/agent.ts ~1735) — while its neighbor 936521 was accepted live 37 seconds after mining. So the hole formed at 936520 too and was only backfilled by a later agent restart.
Downstream impact: a CashToken category we track had 66 UTXOs reported unspent by Chaingraph that Fulcrum/Electrum report as spent. All 66 are spent by transactions whose only block_inclusions row points at the unaccepted orphan at 953896. One of those spenders (68ea268e6ebc61440b56b3bfdacd79cc161fbe5e2612cff497f29d56953a9344) is confirmed on the real chain in exactly the missing canonical block …d303391d. The transactions were re-mined in the replacement block but Chaingraph never indexed it.
Root cause
-
handleStaleBlocks does not roll back SyncState (src/agent.ts ~1839–1854). It calls removeStaleBlocksForNode (deletes the orphan's node_block row, src/db.ts ~988–1001) but never calls syncState.blockReorganizationAtHeight(firstHeight) — which exists for exactly this purpose (src/components/sync-state.ts ~122–137). SyncState therefore still reports the reorged height as synced, so selectNextBlockToDownload never selects the canonical replacement for download. The child block syncs normally, and fullySyncedUpToHeight advances past the hole.
-
Silent failure modes hide the problem. acceptBlocksViaHeaders (src/db.ts ~935–972) joins incoming hashes against block and silently inserts zero node_block rows for any hash with no block row — no error, no warning. And the in-memory blockDb set is populated once at startup (src/agent.ts ~430–432) and never updated at runtime, so catchUpViaHeaders (src/agent.ts ~1367–1430) reasons from stale knowledge of what is saved.
Sequence
- Node announces a one-block reorg at height N via
headers; BlockTree.updateHeaders splices in the canonical hash and fires onStaleBlocks (src/components/block-tree.ts ~208–218).
handleStaleBlocks deletes the orphan's node_block row. SyncState is untouched and still considers height N synced.
- No download of canonical N is ever scheduled (
selectNextBlockToDownload skips "synced" heights). Block N+1 arrives, is saved, and is accepted.
- The database is now: orphan at N (unaccepted), no canonical N, accepted N+1. Permanent until restart.
The sequence above describes a one-block reorg (the case observed in production), but the mechanism generalizes:
- Deeper reorgs: for a depth-d reorg replacing already-synced heights N…N+d−1,
onStaleBlocks fires with the full stale chain, all d node_block rows are deleted, SyncState still reports all d heights as synced, and selectNextBlockToDownload schedules only the net-new heights above the old tip — leaving a d-block hole. Reorgs deeper than 8 blocks arrive via inv rather than headers, but that handler (src/agent.ts ~656–665) just calls requestHeaders, funneling into the same updateHeaders → onStaleBlocks path.
- Flip-flop reorgs (chain reorgs away from a block and later back to it) produce a variant hole: the canonical block row survives from its first acceptance, but its
node_block row was deleted on the reorg-away and nothing re-inserts it — catchUpViaHeaders never reaches that height because SyncState reports it synced. Result: block present but unaccepted.
- The only case with no hole is a reorg at heights the agent had not yet synced (still catching up below the fork point) — SyncState is below the fork, so the canonical blocks download in normal course. This is why the bug specifically bites in steady-state operation at tip.
Why a restart repairs it (workaround — verified)
On restart, registerTrustedNodeWithDb (src/db.ts ~460–504) rebuilds syncedHeaderHashChain from block ⋈ node_block; the damaged height yields no row, producing a null in the chain (blockArrayToHashChain, src/db.ts ~62–80). restoreChainForNode then sets fullySyncedUpToHeight = N−1, and after header sync fillBlockBuffer/selectNextBlockToDownload schedules the canonical block at N for download; it is saved with accepted_at = NULL (timestamp older than 2 hours, src/agent.ts ~1735). We verified this end-to-end: restarting the damaged instance repaired both holes within the session, and the backfilled node_block rows carry the NULL signature. Note repairIncompleteBlocks does not catch this case — it only repairs blocks that exist with mismatched transaction counts.
Suggested fix
In handleStaleBlocks, roll back sync state before/alongside removing stale blocks, and re-trigger the buffer fill:
handleStaleBlocks(staleChain: string[], firstHeight: number, nodeName: string) {
this.nodes[nodeName]?.syncState?.blockReorganizationAtHeight(firstHeight);
removeStaleBlocksForNode(this.nodes[nodeName]!.internalId!, staleChain)
.then(() => {
this.logger.info(staleChain, `${nodeName}: re-organization at height ${firstHeight}`);
this.scheduleBlockBufferFill();
})
.catch((err) => this.logger.error(err));
}
Hardening, secondarily:
- Add saved hashes to
this.blockDb when saveBlock succeeds, so catchUpViaHeaders reasons from current state.
- Make
acceptBlocksViaHeaders log a warning (or fail loudly) when fewer node_block rows are inserted than hashes supplied — today it silently drops hashes that have no block row.
- Optionally extend the periodic audit to detect accepted-chain holes (height H accepted at H−1 and H+1 but not H), which would self-heal instances already damaged.
Tests and line numbered based on commit 535e41b.
Co-athored with claude-fable-5.
When a node announces a reorg via
headers, the agent removes the stale block'snode_blockrow but never rolls backSyncStatefor the reorged height, so the replacement block(s) is/are never downloaded.The result is a hole in the accepted chain: the orphan block remains in
blockwith nonode_blockrow, the canonical block at that height is entirely absent from the database, and the child block is saved and accepted normally. The hole persists for as long as the agent process stays up, and silently corrupts spent/unspent queries: outputs spent by transactions in the missing canonical block appear unspent.A restart of the agent fixes this because it pulls all block heights from the DB and syncs from the first height not found.
Example
Environment: production instance, mainnet, single trusted node, BCHN 29.0.0
Two damaged heights observed:
Height 953896 (reorg on 2026-06-04)
accepted_by: []:000000000000000001e0c1c70eff0bee1c958f4e9499166dceca3357dedcb045(34 txs)0000000000000000004b32671f255b4cc6e0a1bead73616aded4e2d6d303391d(37 txs)Height 951556 (reorg on 2026-05-18)
accepted_by: []:00000000000000000127178b4dc9a30742aa3f505373bf24b2812fd877b6a6c7(1 tx)000000000000000000e70f428e1b05869697093a26d951a8202d1258cd9e42fb(3 txs)In both cases the neighboring heights (e.g. 953895 and 953897) are saved and accepted by the node, so the accepted chain has a one-block gap — a state that should be impossible.
For contrast, height 936520 (reorg on 2026-02-02) shows the repaired outcome: both the orphan (
…ed411c59, unaccepted) and the canonical block (…15d4a0d0, accepted) are present. Notably the canonical 936520'snode_block.accepted_atisNULL— the signature of a block saved more than 2 hours after its timestamp (saveBlock,src/agent.ts~1735) — while its neighbor 936521 was accepted live 37 seconds after mining. So the hole formed at 936520 too and was only backfilled by a later agent restart.Downstream impact: a CashToken category we track had 66 UTXOs reported unspent by Chaingraph that Fulcrum/Electrum report as spent. All 66 are spent by transactions whose only
block_inclusionsrow points at the unaccepted orphan at 953896. One of those spenders (68ea268e6ebc61440b56b3bfdacd79cc161fbe5e2612cff497f29d56953a9344) is confirmed on the real chain in exactly the missing canonical block…d303391d. The transactions were re-mined in the replacement block but Chaingraph never indexed it.Root cause
handleStaleBlocksdoes not roll back SyncState (src/agent.ts~1839–1854). It callsremoveStaleBlocksForNode(deletes the orphan'snode_blockrow,src/db.ts~988–1001) but never callssyncState.blockReorganizationAtHeight(firstHeight)— which exists for exactly this purpose (src/components/sync-state.ts~122–137). SyncState therefore still reports the reorged height as synced, soselectNextBlockToDownloadnever selects the canonical replacement for download. The child block syncs normally, andfullySyncedUpToHeightadvances past the hole.Silent failure modes hide the problem.
acceptBlocksViaHeaders(src/db.ts~935–972) joins incoming hashes againstblockand silently inserts zeronode_blockrows for any hash with noblockrow — no error, no warning. And the in-memoryblockDbset is populated once at startup (src/agent.ts~430–432) and never updated at runtime, socatchUpViaHeaders(src/agent.ts~1367–1430) reasons from stale knowledge of what is saved.Sequence
headers;BlockTree.updateHeaderssplices in the canonical hash and firesonStaleBlocks(src/components/block-tree.ts~208–218).handleStaleBlocksdeletes the orphan'snode_blockrow. SyncState is untouched and still considers height N synced.selectNextBlockToDownloadskips "synced" heights). Block N+1 arrives, is saved, and is accepted.The sequence above describes a one-block reorg (the case observed in production), but the mechanism generalizes:
onStaleBlocksfires with the full stale chain, all dnode_blockrows are deleted, SyncState still reports all d heights as synced, andselectNextBlockToDownloadschedules only the net-new heights above the old tip — leaving a d-block hole. Reorgs deeper than 8 blocks arrive viainvrather thanheaders, but that handler (src/agent.ts~656–665) just callsrequestHeaders, funneling into the sameupdateHeaders→onStaleBlockspath.node_blockrow was deleted on the reorg-away and nothing re-inserts it —catchUpViaHeadersnever reaches that height because SyncState reports it synced. Result: block present but unaccepted.Why a restart repairs it (workaround — verified)
On restart,
registerTrustedNodeWithDb(src/db.ts~460–504) rebuildssyncedHeaderHashChainfromblock⋈node_block; the damaged height yields no row, producing anullin the chain (blockArrayToHashChain,src/db.ts~62–80).restoreChainForNodethen setsfullySyncedUpToHeight = N−1, and after header syncfillBlockBuffer/selectNextBlockToDownloadschedules the canonical block at N for download; it is saved withaccepted_at = NULL(timestamp older than 2 hours,src/agent.ts~1735). We verified this end-to-end: restarting the damaged instance repaired both holes within the session, and the backfillednode_blockrows carry theNULLsignature. NoterepairIncompleteBlocksdoes not catch this case — it only repairs blocks that exist with mismatched transaction counts.Suggested fix
In
handleStaleBlocks, roll back sync state before/alongside removing stale blocks, and re-trigger the buffer fill:Hardening, secondarily:
this.blockDbwhensaveBlocksucceeds, socatchUpViaHeadersreasons from current state.acceptBlocksViaHeaderslog a warning (or fail loudly) when fewernode_blockrows are inserted than hashes supplied — today it silently drops hashes that have noblockrow.Tests and line numbered based on commit 535e41b.
Co-athored with claude-fable-5.