fix(node): anchor the real old_sha and issue a per-ref certificate#72
fix(node): anchor the real old_sha and issue a per-ref certificate#72beardthelion wants to merge 7 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthrough
ChangesPer-ref old_sha propagation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai full review |
✅ Action performedFull review finished. |
|
@coderabbitai make a full review of this PR |
|
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/api/repos.rs (1)
826-848: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick winGossip event still emits empty
old_shathough the real value is now in hand.The destructure discards
old_shaas_old_shaat Line 826, thenpublish_ref_updatehardcodesold_sha: "".to_string()at Line 842. This is the same integrity gap this PR closes for Arweave/DB — peers receiving the gossip event get noold→newtransition. Bind and forward the real value.Proposed fix
- for (ref_name, _old_sha, new_sha) in &ref_updates_clone { + for (ref_name, old_sha, new_sha) in &ref_updates_clone { let cid = cid_map.get(new_sha).map(|s| s.as_str()); @@ p2p.publish_ref_update(crate::p2p::RefUpdateEvent { node_did: node_did_str.clone(), pusher_did: pusher_did_clone.clone(), repo: repo_slug.clone(), ref_name: ref_name.clone(), - old_sha: "".to_string(), + old_sha: old_sha.clone(), new_sha: new_sha.clone(),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/repos.rs` around lines 826 - 848, The gossip event being published via p2p.publish_ref_update() in the loop over ref_updates_clone hardcodes old_sha as an empty string instead of using the actual value available from the destructuring. Change the destructuring pattern to capture the old_sha value instead of discarding it with an underscore, and then use that captured value when constructing the RefUpdateEvent struct passed to publish_ref_update, replacing the hardcoded empty string assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 920-928: The GraphQL RefUpdateBroadcast is collapsing multi-ref
pushes by only emitting the first ref with a hardcoded zero old_sha. Instead of
using ref_updates_clone.first() once to create a single broadcast, loop through
each tuple in ref_updates_clone and emit one RefUpdateBroadcast per ref. For
each iteration, extract the ref_name from the first tuple element, the real
old_sha from the second tuple element, and the new_sha from the third tuple
element, rather than hardcoding old_sha to zeros and only processing the first
ref. This ensures subscribers receive all refs in a multi-ref push with their
accurate state transitions.
---
Outside diff comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 826-848: The gossip event being published via
p2p.publish_ref_update() in the loop over ref_updates_clone hardcodes old_sha as
an empty string instead of using the actual value available from the
destructuring. Change the destructuring pattern to capture the old_sha value
instead of discarding it with an underscore, and then use that captured value
when constructing the RefUpdateEvent struct passed to publish_ref_update,
replacing the hardcoded empty string assignment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e4bde16d-c752-4472-b903-6d9411879d89
📒 Files selected for processing (2)
crates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/arweave.rs
jatmn
left a comment
There was a problem hiding this comment.
Summary
The PR correctly threads each ref's real old_sha into Arweave anchoring and issues one signed ref certificate per updated ref. I found one remaining integrity gap in downstream notifications/subscriptions that still flatten multi-ref pushes to the first ref and a zero old_sha.
Findings
- HTTP sync notifications still misreport multi-ref pushes and old_sha
In git_receive_pack, the PR updates
ef_updates_clone to include (ref_name, old_sha, new_sha) and uses that for Arweave anchoring, but the HTTP peer notification body still takes
ef_name/
ew_sha from
ef_updates_clone.first() and hardcodes old_sha to all zeros (crates/gitlawb-node/src/api/repos.rs:868-878). On the receiving side,
otify_sync persists those fields into ReceivedRefUpdate (crates/gitlawb-node/src/api/peers.rs:264-283), matching the gossipsub path that is supposed to carry per-ref provenance (crates/gitlawb-node/src/p2p/mod.rs:33-54,297-318). So after this change, a multi-ref push will anchor correctly on Arweave but peers reached via the HTTP fallback will still record only the first ref and the wrong previous SHA. That leaves the integrity bug unresolved for one of the codepaths this handler already supports.
PR #72 made the Arweave/DB anchor path per-ref with the real old_sha but left three sibling fan-out paths in git_receive_pack flattening multi-ref pushes and/or emitting a placeholder old_sha. Bring them in line: - gossip publish_ref_update now forwards the real old_sha (was ""), matching the RefUpdateEvent contract that documents all-zeros for a created ref - HTTP /sync/notify sends one signed request per ref with its real old_sha; the receiver is single-ref, so a multi-ref push no longer drops refs 2..N - GraphQL RefUpdateBroadcast emits one broadcast per ref with the real old_sha instead of only the first ref + a zero placeholder No receiver, struct, or wire-schema change; the all-zeros-for-created-ref convention is preserved on the wire. Addresses jatmn's CHANGES_REQUESTED and the two CodeRabbit findings.
Extract the HTTP /sync/notify send into notify_peer_of_ref (one ref) and notify_peer_of_refs (one peer, looping refs), collapsing the deeply-nested inline block in git_receive_pack and giving the fan-out a unit-testable seam. The ref_name is now also logged on every warn arm. Add mockito regression tests: a two-ref push must emit one request per ref each carrying its real old_sha (guards against re-flattening to .first() with a zero placeholder, the bug PR #72 fixed), and a created ref must forward the all-zeros hash verbatim. Behavior is unchanged; this only adds the seam, the tests, and ref_name in the logs.
|
@jatmn addressed in 3ed5acd. The HTTP Follow-up in ada4e55 extracts the send into a One thing worth a separate issue, not this PR: Re-requesting review. |
|
Filed the |
jatmn
left a comment
There was a problem hiding this comment.
Findings
BLOCKING
1. Merge conflict with main must be resolved
Severity: Blocking
Evidence:
gh pr view 72 --repo Gitlawb/node --json mergeablereturns"CONFLICTING".git merge-base HEAD origin/mainreportsfatal: no merge basein the local checkout.git diff origin/main...HEADis therefore not usable; the PR branch diverged frommainbefore commit1af4fdf("feat(node): pin the per-push object delta...").- The current
maincontains changes that the PR branch is missing, including:owner_push_rejectionenforcement andExtension(auth)-based identity ingit_receive_pack(the PR branch still usesheaders: HeaderMapandextract_did_from_auth).EncryptedManifest::blobschanged from&'a [(String, String, Vec<String>)]to&'a [(String, String)].pinata::pin_new_objectsandipfs_pin::pin_new_objectssignatures changed to accept per-pushcandidatesinstead of awithheldset.replication_withheld_sethelper extracted inmain.
Recommendation: Rebase or merge main into the PR branch and reconcile the conflicting ref-update fan-out logic with the new owner-push gate and pin-delta changes. Do not merge until the PR is reported as mergeable.
HIGH
2. EncryptedManifest blob tuple shape is inconsistent with current main
Severity: High
Evidence: In the PR branch, arweave.rs line 115 defines blobs: &'a [(String, String, Vec<String>)] and manifest_blob_json unpacks a 3-tuple; current main uses &'a [(String, String)] and a 2-tuple.
Impact: Once the merge conflict is resolved, the branch will not compile against main unless the tuple shape is reconciled. This is a direct conflict in the changed file.
Recommendation: During rebase, adopt main's 2-tuple shape (no recipient list in the manifest) and remove the third element from tuple destructuring in arweave.rs and the caller in repos.rs.
MEDIUM
3. Existing Arweave test still uses a 64-hex zero old_sha
Severity: Medium
Evidence: arweave.rs line 246 in test_anchor_noop_when_url_empty uses old_sha: "0000000000000000000000000000000000000000000000000000000000000000".into() (64 hex chars). The PR's commit message and new test correctly use 40-hex SHA-1 values.
Impact: Minor inconsistency in test fixtures; the production code path now uses real 40-hex SHAs from parse_ref_updates, so this does not affect runtime correctness, but it weakens the signal that the width mismatch was fixed.
Recommendation: Update the existing fixture to a 40-hex zero string to match the fixed contract.
LOW / NITS
4. notify_peer_of_ref takes many positional arguments
Severity: Low
Evidence: repos.rs lines 468–519 define notify_peer_of_ref with ten positional parameters.
Impact: The signature is error-prone for callers and reviewers, although it is only called from one place in production and from tests.
Recommendation: Consider grouping related parameters into small structs (e.g., RefUpdate { ref_name, old_sha, new_sha } and PeerContext { peer_did, notify_url, ... }) in a follow-up. This is not blocking.
5. Tests rely on real HTTP mock timing
Severity: Low
Evidence: The new sync-notify tests use mockito with .expect(1). They passed locally but are sensitive to request ordering; mockito by default matches any request that satisfies the matcher, and the two mocks overlap only by ref_name, so they are correct as written.
Recommendation: No change required; the test design is sound.
The push handler discarded the real old->new transition before anchoring and only certified the first ref of a multi-ref push, so the permanent Arweave record was a weaker, incomplete copy of facts the node had already parsed. - Thread the parsed old_sha through the clone moved into the anchor task instead of hardcoding a 64-zero placeholder. This also fixes the SHA width mismatch: object ids are 40-hex SHA-1, not 64-hex. - Issue a signed RefCertificate for every ref the push advanced, each carrying that ref's real old/new, rather than one cert for the first ref (which, on an empty push, signed a timestamp string as new_sha). Lock the anchor-body contract with a test asserting the upload carries the real old_sha and new_sha, never a placeholder. Scope is the integrity subset only. Carrying the signed certificate on Turbo, capturing the pusher signature, chaining certificates, and gl-side verification remain open under #26. Refs #26
PR #72 made the Arweave/DB anchor path per-ref with the real old_sha but left three sibling fan-out paths in git_receive_pack flattening multi-ref pushes and/or emitting a placeholder old_sha. Bring them in line: - gossip publish_ref_update now forwards the real old_sha (was ""), matching the RefUpdateEvent contract that documents all-zeros for a created ref - HTTP /sync/notify sends one signed request per ref with its real old_sha; the receiver is single-ref, so a multi-ref push no longer drops refs 2..N - GraphQL RefUpdateBroadcast emits one broadcast per ref with the real old_sha instead of only the first ref + a zero placeholder No receiver, struct, or wire-schema change; the all-zeros-for-created-ref convention is preserved on the wire. Addresses jatmn's CHANGES_REQUESTED and the two CodeRabbit findings.
Extract the HTTP /sync/notify send into notify_peer_of_ref (one ref) and notify_peer_of_refs (one peer, looping refs), collapsing the deeply-nested inline block in git_receive_pack and giving the fan-out a unit-testable seam. The ref_name is now also logged on every warn arm. Add mockito regression tests: a two-ref push must emit one request per ref each carrying its real old_sha (guards against re-flattening to .first() with a zero placeholder, the bug PR #72 fixed), and a created ref must forward the all-zeros hash verbatim. Behavior is unchanged; this only adds the seam, the tests, and ref_name in the logs.
Prefix the two new sync-notify tests with test_ and use the _mock binding convention (matching arweave.rs/pinata.rs). No behavior change.
54ba2e1 to
cc83943
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/gitlawb-node/src/api/repos.rs (2)
1071-1094: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winDon’t let best-effort peer notify block local broadcast and anchoring.
This loop awaits every peer/ref HTTP request before Lines 1099-1144 run. With per-ref fan-out, one slow peer can delay GraphQL subscribers and Arweave anchoring. Move this block after the local broadcast/anchor work, or spawn it as an isolated best-effort task.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/repos.rs` around lines 1071 - 1094, The peer notification fan-out in repos.rs is currently awaited inline, which can block the local broadcast and anchoring path. Move the announce/list_peers loop that calls notify_peer_of_refs out of the main ref-update flow so the GraphQL subscriber broadcast and Arweave anchoring run first, or run that peer-notify work as an isolated best-effort task. Keep the change localized around the announce block and the notify_peer_of_refs call sites so the existing local work is not delayed by slow peers.
210-234: 🎯 Functional Correctness | 🟠 MajorAlign paginated owner filtering with
did_matches.list_all_repos_pagedstill usesowner_did = $1 OR owner_did LIKE '%:' || $1, so a bare owner ID can match other DID methods with the same suffix. That makes?owner=...&limit=...behave differently from the unpaginated path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/repos.rs` around lines 210 - 234, The paginated branch in repos listing applies owner filtering differently from the non-paginated branch, so `list_all_repos_paged` must be updated to use the same `did_matches` semantics as the `filtered` logic in `repos.rs`. Adjust the query-building/path used by `list_all_repos_paged` so `?owner=...&limit=...` matches only the intended DID, not any DID with the same suffix, and keep the behavior consistent with `crate::api::did_matches`.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 1071-1094: The peer notification fan-out in repos.rs is currently
awaited inline, which can block the local broadcast and anchoring path. Move the
announce/list_peers loop that calls notify_peer_of_refs out of the main
ref-update flow so the GraphQL subscriber broadcast and Arweave anchoring run
first, or run that peer-notify work as an isolated best-effort task. Keep the
change localized around the announce block and the notify_peer_of_refs call
sites so the existing local work is not delayed by slow peers.
- Around line 210-234: The paginated branch in repos listing applies owner
filtering differently from the non-paginated branch, so `list_all_repos_paged`
must be updated to use the same `did_matches` semantics as the `filtered` logic
in `repos.rs`. Adjust the query-building/path used by `list_all_repos_paged` so
`?owner=...&limit=...` matches only the intended DID, not any DID with the same
suffix, and keep the behavior consistent with `crate::api::did_matches`.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 191bb03d-8643-43f5-b311-975e943d972d
📒 Files selected for processing (2)
crates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/arweave.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/gitlawb-node/src/arweave.rs
jatmn
left a comment
There was a problem hiding this comment.
Findings
BLOCKING
1. cargo fmt --check fails on test-module import order
Severity: Blocking
Evidence:
cargo fmt --check --allreports a diff incrates/gitlawb-node/src/api/repos.rsat the#[cfg(test)] mod testsblock.- Current code (line 1447):
use super::*; use gitlawb_core::identity::Keypair; use crate::auth::caller_authorized_to_push; use crate::error::AppError;
- Expected order (rustfmt groups
use crate::...before external crates):use super::*; use crate::auth::caller_authorized_to_push; use crate::error::AppError; use gitlawb_core::identity::Keypair;
- GitHub Actions
fmt + clippyrun also fails with the samecargo fmt --checkexit code.
Impact: Blocks CI and therefore merge.
Recommendation: Run cargo fmt --all and commit the resulting change.
MEDIUM
2. Best-effort peer notification now blocks local broadcast and Arweave anchoring
Severity: Medium
Evidence:
- In
crates/gitlawb-node/src/api/repos.rs, the spawned background task orders work as follows (lines 1068–1150):- Gossip publish (
for ref_updates_clone, lines 1041–1066) - HTTP peer notify loop (
if announce { ... notify_peer_of_refs(...).await; }, lines 1068–1097) - GraphQL broadcast (lines 1099–1111)
- Arweave anchoring (lines 1113–1150)
- Gossip publish (
notify_peer_of_refsawaits every peer/ref HTTP request sequentially. With per-ref fan-out, a slow or unreachable peer can now delay the GraphQL broadcast and Arweave anchoring by the sum of all HTTP round-trips.- This is inside a
tokio::spawntask, so it does not block the HTTP response, but it does delay internal downstream work that does not depend on peer notification.
Impact: Local subscribers and permanent anchors are delayed by best-effort peer notifications.
Recommendation: Move the HTTP peer-notification block after the GraphQL broadcast and Arweave anchoring, or spawn it as an independent best-effort task so that local work cannot be delayed by peer round-trips.
LOW / NITS
3. Existing Arweave test fixtures still use 64-hex zero old_sha
Severity: Low
Evidence:
crates/gitlawb-node/src/arweave.rsline 246:old_sha: "0000000000000000000000000000000000000000000000000000000000000000".into(),
crates/gitlawb-node/src/arweave.rsline 273:old_sha: "0".repeat(64),
Impact: The PR description and the new test_anchor_body_carries_real_old_sha correctly use 40-hex SHA-1 values and state that the old 64-hex placeholder was the wrong width. Leaving the older fixtures at 64 hex weakens the signal that the width mismatch was fixed, although it does not affect production correctness.
Recommendation: Update both fixtures to the canonical 40-hex zero string (0000000000000000000000000000000000000000) to match the fixed contract.
4. Webhook created check duplicates the all-zeros literal
Severity: Low
Evidence:
crates/gitlawb-node/src/api/repos.rsline 832:"created": update.old_sha == "0000000000000000000000000000000000000000",
- The file already defines
const ZERO_SHA: &str = "0000000000000000000000000000000000000000";at line 21.
Impact: Minor maintainability risk; the literal and constant could drift.
Recommendation: Use update.old_sha == ZERO_SHA in the webhook payload.
- Reorder the repos test-module imports so `use crate::...` precedes the external `gitlawb_core` import (unblocks cargo fmt --check / CI). - Webhook `created` check reuses the existing ZERO_SHA constant instead of repeating the 40-hex literal. - Arweave test fixtures use the canonical 40-hex zero old_sha, matching the SHA-1 width the per-ref anchoring contract now carries.
The post-push background task awaited the HTTP peer-notify fan-out before the GraphQL subscription broadcast and Arweave anchoring. With per-ref notification a slow or unreachable peer could delay both local subscribers and permanent anchoring by the sum of the peer round-trips. Move the peer-notify loop to run last; peers already learned of the push via the gossip publish earlier in the task, so this only deprioritizes the best-effort HTTP fallback.
Review follow-ups on the two prior commits: - The peer-notify ordering comment claimed peers 'already learned via the gossip publish', but that publish is gated on a present p2p handle; with no handle (the very case the HTTP fallback exists for) it does not run. State the actual rationale: peer-notify is the lowest-priority best-effort step, ordered last so it cannot delay local broadcast or anchoring. - The all-zeros notify test still inlined the 40-hex literal; point it at the ZERO_SHA constant like the production path.
|
All four addressed. 1 (blocking) — fmt. Fixed the test-module import order; 2 — peer notification blocking local work. Moved the HTTP peer-notify loop to run last in the background task, after the GraphQL broadcast and Arweave anchoring. Peers are notified later when Irys is slow, but that's the intended deprioritization: it's the lowest-priority best-effort step and can no longer delay local subscribers or anchoring. The whole 3 — Arweave fixtures. Both 4 — duplicated zero literal. The webhook One thing I'll leave for a separate issue rather than widen this PR: |
|
Correcting my own note above, not finding #2: I said the peer-notify path has no per-request timeout. That isn't right, and I don't want to leave it on the record. The shared The real residual is the one your #2 was pointing at: the notify loop is sequential over peers × refs, so a batch of unreachable peers can still sum to roughly peers × 10s of work. Running it after the broadcast and anchoring keeps that off the local path, which was the goal here. If peer fan-out grows, parallelizing or bounding that loop is a reasonable follow-up, but it doesn't seem to block this PR. Happy to file a tracking issue if you'd rather have it written down. |
Refs #26 (integrity subset only; the issue stays open for the foundation work).
What was wrong
The push handler built the Arweave anchor from a copy of the ref updates that had already dropped the real
old_sha, then hardcodedold_shato 64 zeros. The signedRefCertificatewith the realold -> newstayed local. On top of that, a multi-ref push only ever produced one certificate (for the first ref), and an empty push signed a timestamp string asnew_sha. The 64-zero placeholder was also the wrong width: git object ids are 40-hex SHA-1.What this changes
old_shathrough the clone moved into the anchoring task, so the anchor records the realold -> newtransition. This drops the"0".repeat(64)placeholder and fixes the width mismatch.RefCertificatefor every ref the push advanced, each carrying that ref's realold/new, instead of one cert for the first ref.No new dependencies, no schema or config changes.
Scope
Deliberately the integrity subset of #26. Carrying the signed certificate on Turbo, capturing the pusher signature, chaining certificates, and
gl-side verification against the anchor remain open under #26.Tests
arweave::tests::test_anchor_body_carries_real_old_shalocks the anchor-body contract: the upload must serialize the realold_sha/new_sha, never a placeholder.gitlawb-nodesuite green (126 passed);fmtandclippyclean.The per-ref certificate issuance and
old_shathreading live in thegit_receive_packhandler, which has no test harness (it needs a live Postgres and git), so those are verified by compile and review rather than an automated test.Summary by CodeRabbit
Release Notes
Bug Fixes
old_sha/new_shaacross pinning, gossip, sync notifications, broadcasting, and anchoring./api/v1/sync/notifyrequest per ref, including the ref’s realold_sha.Tests
/sync/notifypayloads and for anchoring payloads carrying the realold_sha/new_sha.