fix(node): close under-withholding via full ref scope and reachable-only pin set#46
fix(node): close under-withholding via full ref scope and reachable-only pin set#46beardthelion wants to merge 5 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 7 minutes and 34 seconds. Learn how PR review limits work. To continue reviewing without waiting, enable usage-based billing in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughThis PR implements end-to-end encryption and withholding for Phase 3 confidentiality: withheld blobs are sealed to recipient keys via ephemeral X25519 envelopes, encrypted with XChaCha20Poly1305, pinned to IPFS with opaque recipient tags, anchored to Arweave, replicated across mirrors using promisor-based filtering, and recovered locally via a new ChangesCore encryption and visibility foundations
Server-side filtered pack serving and encrypted blob management
Mirror mode classification and encrypted blob replication
Client-side clone command with recovery paths
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (4)
crates/gl/src/clone.rs (1)
262-267: ⚡ Quick winParse
/encrypted-blobsinto a typed response instead ofValue.A missing or mistyped
blobsfield currently degrades to[]and silently disables node-side recovery. Making this schema-strict would catch server/client contract drift the same wayWithheldPathsResponsealready does.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gl/src/clone.rs` around lines 262 - 267, Replace the loose serde_json::Value parsing for resp in clone.rs with a strongly typed EncryptedBlobsResponse struct so schema mismatches surface as errors: define EncryptedBlobsResponse { blobs: Vec<EncryptedBlobOrString> } (matching the server shape, analogous to WithheldPathsResponse), then call resp.json::<EncryptedBlobsResponse>().await.context("parsing encrypted-blobs")? and use the returned .blobs directly instead of body.get(...).and_then(...).cloned().unwrap_or_default(); this ensures missing or mistyped "blobs" no longer silently becomes [] and will fail fast (update the EncryptedBlobOrString type to the actual blob element type used).crates/gitlawb-node/src/api/encrypted.rs (1)
109-123: ⚡ Quick winAdd a handler test for the 404/404 access-control contract.
The new test only locks the wire shape. A small handler-level test that exercises both “caller denied” and “OID missing” on
get_encrypted_blobwould protect the deliberate non-oracle behavior from future refactors.Based on learnings, this endpoint should keep “not found” and “not authorized” externally indistinguishable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/encrypted.rs` around lines 109 - 123, Add a handler-level test for the access-control contract of get_encrypted_blob that verifies the external response is indistinguishable when the caller is denied versus when the OID is missing: write two tests (or one parameterized test) invoking the get_encrypted_blob handler — one scenario where the auth check fails (simulate "caller denied") and one where the blob store returns NotFound (simulate "OID missing") — and assert both responses have the same HTTP status and body (e.g., both return 404/not-found) to ensure the endpoint does not leak authorization state; reference the get_encrypted_blob handler in your test and reuse existing test scaffolding in the encrypted.rs tests module.Source: Learnings
crates/gitlawb-core/src/encrypt.rs (1)
66-88: ⚡ Quick winZeroize transient key material in
seal_blobandopen_blob.
content_key, the seed-derived X25519 secret, and the unwrapped content key all live in plain stack/heap buffers until drop. In this confidentiality path that is avoidable exposure; keep them inZeroizing(or explicitly callzeroize()) once the ciphers are initialized.Hardening sketch
+use zeroize::Zeroizing; + - let mut content_key = [0u8; 32]; - OsRng.fill_bytes(&mut content_key); + let mut content_key = Zeroizing::new([0u8; 32]); + OsRng.fill_bytes(&mut *content_key); let body_cipher = XChaCha20Poly1305::new_from_slice(&content_key) .map_err(|e| anyhow::anyhow!("content key: {e}"))?; ... - let my_x = XSecret::from(x25519_secret_from_seed(&keypair.seed_bytes())); + let seed = keypair.to_seed(); + let my_x = XSecret::from(x25519_secret_from_seed(&seed)); ... - let content_key = content_key.context("not a recipient of this envelope")?; + let content_key = Zeroizing::new(content_key.context("not a recipient of this envelope")?);Also applies to: 126-167
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-core/src/encrypt.rs` around lines 66 - 88, In seal_blob and open_blob, sensitive key material (the 32-byte content_key, the ephemeral XSecret `eph`, the seed-derived X25519 secret, and any unwrapped content key buffers) must be zeroed after use: wrap stack/heap buffers in zeroize::Zeroizing (e.g., Zeroizing<[u8;32]>, Zeroizing<Vec<u8>>) or explicitly call zeroize() on types that implement Zeroize immediately after the ciphers are initialized and after encryption/decryption completes; ensure you zero `content_key`, the ephemeral secret `eph` (or call its zeroize method), the seed-derived X25519 secret produced by x25519_private/related variable, and the unwrapped content key before returning or pushing into Recipient so no plaintext key remains in memory.crates/gitlawb-node/src/git/visibility_pack.rs (1)
101-130: 🏗️ Heavy liftAvoid re-traversing the repo to derive recipients.
withheld_blob_recipients()does one fullblob_paths()sweep throughwithheld_blob_oids()and then a second fullblob_paths()sweep for the recipient pass. On the push path,git_receive_pack()has already computed the withheld set before it gets here, so repos with visibility rules now pay up to three completels-treetraversals per push. Threading the precomputed withheld/blob-path data into this API would keep recipient derivation single-pass.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/git/visibility_pack.rs` around lines 101 - 130, withheld_blob_recipients currently re-traverses the repo by calling withheld_blob_oids(...) and then blob_paths(...), causing multiple ls-tree sweeps; change withheld_blob_recipients signature to accept precomputed withheld data (e.g., Option<&HashSet<String>> or Option<&HashMap<String,String>> representing withheld oids or oid->path map) so callers like git_receive_pack can pass the already-computed set/map, skip the internal withheld_blob_oids call when provided, and iterate the provided oid->path mapping (or filter blob_paths once) to build recipients; update callers (notably git_receive_pack) to supply the precomputed data and keep behavior identical when None is passed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-core/src/identity.rs`:
- Around line 55-59: The new seed_bytes() method leaks a non-zeroizing copy of
the Ed25519 seed; replace its usage with the existing to_seed() which returns a
Zeroizing<[u8;32]>, or change seed_bytes() to return a zeroizing wrapper (e.g.
Zeroizing<[u8;32]>) instead of plain [u8;32]; update callers such as the code
paths that call seed_bytes() (including open_blob and the encrypted-pin task in
crates/gitlawb-node/src/api/repos.rs) to accept and use the Zeroizing seed or to
call to_seed() directly so the raw seed is never held in plain memory.
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 398-425: Short-circuit the expensive withheld walk by detecting
when the repo's rules cannot withhold any path-scoped blobs (e.g., rules is
empty or only applies to root) before calling
visibility_pack::withheld_blob_oids; if that check shows no possible withheld
blobs, skip the tokio::task::spawn_blocking(...) and call
smart_http::upload_pack(&disk_path, body).await directly, otherwise proceed to
spawn_blocking and then choose between smart_http::upload_pack and
smart_http::upload_pack_excluding based on the resulting withheld set; use the
existing variables and functions (rules, visibility_pack::withheld_blob_oids,
withheld, smart_http::upload_pack, smart_http::upload_pack_excluding) to
implement the early return.
In `@crates/gitlawb-node/src/encrypted_pin.rs`:
- Around line 77-80: Replace silent `continue` branches when
`crate::git::store::read_object(repo_path, oid)` and the IPFS pin operation fail
with warning logs that include contextual details (repo_path, oid, and the
error/result) so failures are observable; in the match arms around `let data =
match crate::git::store::read_object(repo_path, oid) { ... }` and the subsequent
IPFS pin match, call the module’s logger (e.g., process_logger.warn or
log::warn) with a concise message and the error/result, then continue as before
to preserve behavior.
- Around line 72-76: The current code builds `keys: Vec<VerifyingKey>` by
`filter_map(|d| did_to_key(d))`, which silently drops unresolved DIDs and
proceeds to encrypt to a partial recipient set; change this so any unresolved
DID causes the blob to be skipped: instead of `filter_map`, resolve all DIDs and
detect failures (e.g., collect Options or compare `keys.len()` to `dids.len()`
after mapping via `did_to_key`), and when a mismatch is found log a warning
referencing `oid` (similar to the existing log) and `continue` so you do not
seal to a partial set; ensure you keep the `VerifyingKey` handling consistent
with downstream code that expects `keys`.
In `@crates/gitlawb-node/src/ipfs_pin.rs`:
- Around line 75-85: The cat function creates a reqwest::Client with no timeout
(reqwest::Client::new()) which can hang get_encrypted_blob and the sync worker;
modify cat (or callers) to use a client with a bounded timeout (e.g., build a
Client via reqwest::Client::builder().timeout(Duration::from_secs(...)) and
reuse it) or apply a per-request timeout (e.g., wrap the send/bytes await with
tokio::time::timeout) so the POST to {}/api/v0/cat?arg={} fails fast on
slow/stalled Kubo; update the cat signature or module to accept/reuse a Client
or enforce a timeout around resp = client.post(&url).send().await and
resp.bytes().await to avoid indefinite hangs.
In `@crates/gitlawb-node/src/sync.rs`:
- Around line 181-188: The code currently treats a None result from
fetch_withheld as MirrorMode::Plain which incorrectly downgrades unknown/error
states; instead, check the Option returned by fetch_withheld and only set
MirrorMode::Plain when you get Some(empty); if fetch_withheld returns None
preserve the repository's existing promisor/partialclone state (do not call
classify_mirror on None). Concretely, change the branch around
fetch_withheld/classify_mirror to: if withheld.is_some() { let mode =
classify_mirror(withheld); } else { read the current repo promisor/partialclone
setting from the local repo at local_path (or otherwise keep the current
MirrorMode) and use that mode for fetch_repo/clone_repo }, so
fetch_repo/clone_repo are invoked with the preserved mode rather than
unconditionally using MirrorMode::Plain.
In `@crates/gl/src/clone.rs`:
- Around line 428-459: The current GraphQL fetch (the local `query` +
`gql_body`, use of `client.post(...).json(&gql_body).send().await`, and
subsequent `parse_tx_ids(&gql)`) only requests first:100 and stops, causing
missing manifests; change the logic to paginate: rewrite the GraphQL query to
take variables for "first" and "after" (cursor), then loop POSTing the query
(building `gql_body` with `{ "repo": slug, "first": batch_size, "after": cursor
}`), call `parse_tx_ids` (or adjust it to return edges + pageInfo) to collect tx
ids across pages, update `cursor` from pageInfo.nextCursor/hasNextPage until no
more pages, then proceed to fetch manifests and call
`merge_manifests(manifests)` as before; ensure the loop uses the same
`client.post(...).json(&gql_body).send().await` error handling as existing code
and retains the existing behavior when responses fail.
- Around line 415-416: The slug normalization and Arweave discovery/parsing need
tightening: ensure owner short-form uses the same split logic as the node by
keeping owner_short = owner.split(':').next_back().unwrap_or(owner) and forming
slug = format!("{owner_short}/{name}"); update the GraphQL discovery for
encrypted-manifest anchors in discover/recovery code to paginate instead of
hard-coding first:100 (iterate queries until no more edges or timestamps
covered) so recover_encrypted_blobs sees all anchors; change
recover_encrypted_blobs to deserialize the /encrypted-blobs response into a
strict typed struct (like the existing withheld-paths type) and fail on
missing/wrong fields instead of using Value with unwrap_or_default(), ensuring
schema-strict parsing and surfaced errors.
---
Nitpick comments:
In `@crates/gitlawb-core/src/encrypt.rs`:
- Around line 66-88: In seal_blob and open_blob, sensitive key material (the
32-byte content_key, the ephemeral XSecret `eph`, the seed-derived X25519
secret, and any unwrapped content key buffers) must be zeroed after use: wrap
stack/heap buffers in zeroize::Zeroizing (e.g., Zeroizing<[u8;32]>,
Zeroizing<Vec<u8>>) or explicitly call zeroize() on types that implement Zeroize
immediately after the ciphers are initialized and after encryption/decryption
completes; ensure you zero `content_key`, the ephemeral secret `eph` (or call
its zeroize method), the seed-derived X25519 secret produced by
x25519_private/related variable, and the unwrapped content key before returning
or pushing into Recipient so no plaintext key remains in memory.
In `@crates/gitlawb-node/src/api/encrypted.rs`:
- Around line 109-123: Add a handler-level test for the access-control contract
of get_encrypted_blob that verifies the external response is indistinguishable
when the caller is denied versus when the OID is missing: write two tests (or
one parameterized test) invoking the get_encrypted_blob handler — one scenario
where the auth check fails (simulate "caller denied") and one where the blob
store returns NotFound (simulate "OID missing") — and assert both responses have
the same HTTP status and body (e.g., both return 404/not-found) to ensure the
endpoint does not leak authorization state; reference the get_encrypted_blob
handler in your test and reuse existing test scaffolding in the encrypted.rs
tests module.
In `@crates/gitlawb-node/src/git/visibility_pack.rs`:
- Around line 101-130: withheld_blob_recipients currently re-traverses the repo
by calling withheld_blob_oids(...) and then blob_paths(...), causing multiple
ls-tree sweeps; change withheld_blob_recipients signature to accept precomputed
withheld data (e.g., Option<&HashSet<String>> or Option<&HashMap<String,String>>
representing withheld oids or oid->path map) so callers like git_receive_pack
can pass the already-computed set/map, skip the internal withheld_blob_oids call
when provided, and iterate the provided oid->path mapping (or filter blob_paths
once) to build recipients; update callers (notably git_receive_pack) to supply
the precomputed data and keep behavior identical when None is passed.
In `@crates/gl/src/clone.rs`:
- Around line 262-267: Replace the loose serde_json::Value parsing for resp in
clone.rs with a strongly typed EncryptedBlobsResponse struct so schema
mismatches surface as errors: define EncryptedBlobsResponse { blobs:
Vec<EncryptedBlobOrString> } (matching the server shape, analogous to
WithheldPathsResponse), then call
resp.json::<EncryptedBlobsResponse>().await.context("parsing encrypted-blobs")?
and use the returned .blobs directly instead of
body.get(...).and_then(...).cloned().unwrap_or_default(); this ensures missing
or mistyped "blobs" no longer silently becomes [] and will fail fast (update the
EncryptedBlobOrString type to the actual blob element type used).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: fab81d81-dc64-46fe-b486-709131c8a108
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
crates/gitlawb-core/Cargo.tomlcrates/gitlawb-core/src/encrypt.rscrates/gitlawb-core/src/identity.rscrates/gitlawb-core/src/lib.rscrates/gitlawb-node/Cargo.tomlcrates/gitlawb-node/src/api/encrypted.rscrates/gitlawb-node/src/api/mod.rscrates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/api/visibility.rscrates/gitlawb-node/src/arweave.rscrates/gitlawb-node/src/db/mod.rscrates/gitlawb-node/src/encrypted_pin.rscrates/gitlawb-node/src/git/mod.rscrates/gitlawb-node/src/git/smart_http.rscrates/gitlawb-node/src/git/visibility_pack.rscrates/gitlawb-node/src/ipfs_pin.rscrates/gitlawb-node/src/main.rscrates/gitlawb-node/src/pinata.rscrates/gitlawb-node/src/server.rscrates/gitlawb-node/src/sync.rscrates/gitlawb-node/src/visibility.rscrates/gl/src/clone.rscrates/gl/src/main.rs
|
hello @beardthelion do you want to be a maintainer of this node repository? I would like to add you as one. let me know if you are interested. |
|
Thanks @kevincodex1, I'd be glad to. I've enjoyed working on this and I'm happy to take on more responsibility for it. Couple of things so I know what I'm signing up for: what's the expectation for the role, just merge access, or also triage/releases and general upkeep? And who else maintains it currently? Either way, count me in. |
|
please rebase and fix conflicts |
90dc8d4 to
67afb20
Compare
…adability
The origin no longer stores recipient DIDs. Migration v5 replaces the
encrypted_blobs.recipients column with an opaque, node-keyed recipients_tag
used only to detect a recipient-set change for re-seal. B1 discovery and fetch
are now gated by the same repo-readability check the git read path uses, not by
per-recipient matching; decryption is gated by the envelope crypto, so a
non-recipient who can read the repo sees a blob's {oid, cid} but cannot open it.
encrypt_and_pin keys the tag from the node seed and returns {oid, cid}; the
Arweave manifest tuple drops the now-unused recipient vec.
A DB compromise no longer reveals the reader set; recovering it would require
brute-forcing candidate DID sets against the keyed tag with the node key.
…rror
Address review on the at-rest blinding change:
- The encrypted-blobs/replicate listing returned {oid, cid} with no visibility
check, so a non-readable repo's blob index was reachable by an unauthenticated
caller who guessed {owner}/{repo}. Gate it by the same repo-readability check
discovery and fetch use. For the intended case (a public repo with withheld
subtrees) the public root keeps this open to peers; only fully non-readable
repos are withheld, which is the desired behavior.
- encrypt_and_pin treated a recipients_tag DB read error as a cache miss and
resealed, causing avoidable IPFS writes during a partial outage; skip and retry
on the next push instead.
- Correct the get_encrypted_blob doc comment to describe repo-readability access.
…nly pin set blob_paths walked only refs/heads/* and refs/tags/* and skipped silently on a failed git ls-tree, so a blob reachable only through another namespace, or a ref that failed to traverse, could fall out of the withheld set and ship in cleartext. Walk every ref and fail closed on traversal error. The pin enumerators (ipfs_pin, pinata) used git cat-file --batch-all-objects, which includes unreachable/dangling objects that have no path and cannot be classified for withholding. Switch them to git rev-list --objects --all so the pin set matches the reachable graph blob_paths evaluates.
67afb20 to
cd5cda4
Compare
…tate Two correctness fixes flagged in review on this branch. encrypted_pin: encrypt_and_pin built its recipient key list with filter_map, silently dropping any DID that does not resolve to an inline key (did:web / did:gitlawb) and sealing to the remaining subset. Because recipients_tag is computed over the full DID set, the blob would never be re-sealed once the dropped DID later resolved, permanently locking that authorized reader out. resolve_recipient_keys now returns None if any DID is unresolvable, and the blob is skipped (and retried next push) rather than sealed to a partial set. sync: fetch_withheld returns None on any 404/5xx/network/parse error, and classify_mirror collapsed None into MirrorMode::Plain. On an existing promisor mirror that downgrade strips the partial-clone config and --refetches a still-withheld repo, which fails, so a transient withheld-paths outage broke syncs until the endpoint recovered. decide_mode now preserves an existing promisor mirror's mode when the lookup is unknown; fresh-clone fail-closed behavior is unchanged.
|
Rebased onto main and conflicts resolved. Branch fast-forwards cleanly on the current main (tip 48c45bc), no conflicts remaining. Ready for review/merge. |
Closes #42.
blob_pathsinvisibility_pack.rsbuilds the blob→path map that every withholding decision rests on. Two flaws let it produce an incomplete withheld set, which downstream code then treats as public, leaking withheld content in cleartext:Ref-namespace gap. It walked only
refs/heads/*andrefs/tags/*, but the serve path enumeratesgit rev-list --objects --all(every object reachable from any ref). A blob reachable only through another namespace (e.g.refs/notes/*) never entered the withheld set and shipped in cleartext. Fix: drop the heads/tags filter and walk every ref, still per-ref viagit ls-tree -rso all paths a blob lives at are seen (a blob is withheld only if denied at every path; if it is public at some allowed path it must still be sent). Note this is deliberately notrev-list --objects, which reports one representative path per object and would regress that correctness.Fail-open ref walk. A non-zero
git ls-treeexit was silently skipped, yielding a partial set. Fix: fail closed (bail!). All three callers already abort on error (serve propagates, push/encrypt-pin skip the block), so no path serves unfiltered.It also narrows the two pin enumerators (
ipfs_pin,pinata) fromgit cat-file --batch-all-objectstogit rev-list --objects --all. The old call included unreachable/dangling objects, which have no ref and no tree path, so visibility rules cannot classify them; a dangling copy of a secret blob would bypass withholding straight into a permanent IPFS pin. Reachable-only keeps the pin set aligned with whatblob_pathscan evaluate.Four tests added: a blob reachable only via a non-head/tag ref is still withheld; an untraversable ref makes the withheld computation return
Err; and a dangling blob is absent from each pin enumerator while a committed blob is present.Stacks on #40 (base=main), carrying the cumulative Phase-2/3 prereq diff since these files exist only in the unmerged stack.
Summary by CodeRabbit
gl clonecommand supporting partial clones with content filtering and automatic blob recovery