fix(audit): serialise log_event with a file lock so concurrent writers cannot fork the hash chain#263
Open
galuis116 wants to merge 1 commit into
Open
Conversation
…s cannot fork the hash chain without the lock, two log_event calls racing on the same kb both observe the same prev_hash from _last_hash, both append to audit.log.jsonl, and verify_chain reports "previous hash mismatch" at the second concurrent entry — permanently breaking the tamper-evidence guarantee vouchdev#244/vouchdev#253 added. reachable from vouch serve handling concurrent agents, vouch serve alongside cli commands on the same kb, scripted backgrounded approvals, and scheduled expire / sync runs. holds an exclusive lock on a sibling audit.log.jsonl.lock file for the duration of read-prev-hash → derive → append. uses fcntl.flock on posix and msvcrt.locking on windows. the audit log itself is never opened in a mode that could truncate it. dynamic __import__ keeps mypy clean on both platforms. new regression spawns four multiprocessing.spawn workers writing 20 events each and asserts verify_chain.ok plus the full 80 events landed — threads hide the race behind the gil. a second test asserts the lock is released on an exception mid-write so the next call succeeds. Fixes vouchdev#262
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
audit.log_event(src/vouch/audit.py) now holds an exclusive cross-process lock for the duration of read-prev_hash→ derive → append. The lock is held on a siblingaudit.log.jsonl.lockfile (so the audit log itself is never opened in a mode that could truncate it) viafcntl.flockon POSIX andmsvcrt.lockingon Windows._audit_lock(kb_dir)is the single context manager;log_event's critical section is onewith _audit_lock(...)block. AddsAUDIT_LOCKFILEconstant.Why
Fixes #262. Before this PR,
log_eventdid:_last_hash(kb_dir)— read the whole log to get the lasthash.hashfrom thatprev_hash.open("a")and append.The three steps were not atomic. Two concurrent writers both observed the same
prev_hash, both chained off it, andverify_chainthen reported"previous hash mismatch"at the second concurrent event — permanently breaking the tamper-evidence guarantee #244/#253 was added to provide. Reachable fromvouch serve+ concurrent CLI, multiple agents on JSONL, scripted backgrounded approvals (vouch approve a & vouch approve b), and scheduledvouch expirerunning whilevouch sync-applywrites its own audit events. The README's headline use case (Claude Code + Cursor + a CI botsharing one repo) is exactly this scenario.The matching read surface (
verify_chain) already documents thatprev_hashmust form a monotone chain — the writer just needed to actually produce one under concurrency. Sametighten the writer to match the existing reader invariantlane as #74 (per-file sha256 on import) / #123 (write-time graph refs vsfsck) / #149 (artifact-id containment vsread_under_root).What might break
Nothing on-disk, schema, MCP/JSONL surface, or audit-log shape. Strictly additive serialisation.
os.open+flockper event. Negligible._last_hashscan, but correctness is preserved. (_last_hash's O(N) scan is a separate optimisation worth tackling in a follow-up — out of scope here.)audit.log.jsonl.lockfile appears in.vouch/. Empty, used only for locking. Not committed (covered by the existing.vouch/.gitignorestate.db-*style; if it surfaces in git it should be added to.gitignorein a follow-up).fcntlon POSIX,msvcrton Windows. The dynamic__import__pattern keepsmypy srcclean on both (fcntl.LOCK_EX/msvcrt.LK_LOCKaren't visible to mypy when the other platform's stubs are loaded).VEP
Not required — adds an internal correctness guard to match an invariant
verify_chainalready enforces; no surface change.Tests
make checkpasses locally —ruff check src testsclean,mypy src→Success: no issues found in 71 source files,pytest tests/test_audit.py8/8 green plus the broader affected modules (test_storage,test_health,test_sessions,test_cli,test_sync,test_verify) all green. (tests/test_bundle.py::test_import_apply_rejects_absolute_pathfails ontestindependent of this change — pre-existing regex-vs-message mismatch, confirmed by re-running on the parent commit.)tests/test_audit.py:test_audit_chain_survives_concurrent_writers— spawns fourmultiprocessing.spawnworkers writing 20 events each on a shared KB, assertsverify_chain.ok is Trueand the full4 × 20 = 80events landed. Uses subprocesses, not threads — the GIL serialises Python code and hides the file-level race.test_audit_lock_released_on_exception— monkey-patchesos.fsyncto raise mid-event, asserts the nextlog_eventcall succeeds (lock released, chain continues from the surviving prefix).CHANGELOG.mdupdated under## [Unreleased] ### Fixed.Closes #262.