Skip to content

fix(node): close under-withholding via full ref scope and reachable-only pin set#46

Closed
beardthelion wants to merge 5 commits into
Gitlawb:mainfrom
beardthelion:fix/withholding-ref-scope
Closed

fix(node): close under-withholding via full ref scope and reachable-only pin set#46
beardthelion wants to merge 5 commits into
Gitlawb:mainfrom
beardthelion:fix/withholding-ref-scope

Conversation

@beardthelion

@beardthelion beardthelion commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Closes #42.

blob_paths in visibility_pack.rs builds the blob→path map that every withholding decision rests on. Two flaws let it produce an incomplete withheld set, which downstream code then treats as public, leaking withheld content in cleartext:

  1. Ref-namespace gap. It walked only refs/heads/* and refs/tags/*, but the serve path enumerates git rev-list --objects --all (every object reachable from any ref). A blob reachable only through another namespace (e.g. refs/notes/*) never entered the withheld set and shipped in cleartext. Fix: drop the heads/tags filter and walk every ref, still per-ref via git ls-tree -r so all paths a blob lives at are seen (a blob is withheld only if denied at every path; if it is public at some allowed path it must still be sent). Note this is deliberately not rev-list --objects, which reports one representative path per object and would regress that correctness.

  2. Fail-open ref walk. A non-zero git ls-tree exit was silently skipped, yielding a partial set. Fix: fail closed (bail!). All three callers already abort on error (serve propagates, push/encrypt-pin skip the block), so no path serves unfiltered.

It also narrows the two pin enumerators (ipfs_pin, pinata) from git cat-file --batch-all-objects to git rev-list --objects --all. The old call included unreachable/dangling objects, which have no ref and no tree path, so visibility rules cannot classify them; a dangling copy of a secret blob would bypass withholding straight into a permanent IPFS pin. Reachable-only keeps the pin set aligned with what blob_paths can evaluate.

Four tests added: a blob reachable only via a non-head/tag ref is still withheld; an untraversable ref makes the withheld computation return Err; and a dangling blob is absent from each pin enumerator while a committed blob is present.

Stacks on #40 (base=main), carrying the cumulative Phase-2/3 prereq diff since these files exist only in the unmerged stack.

Summary by CodeRabbit

  • New Features
    • Added encrypted storage and retrieval for withheld blobs
    • New gl clone command supporting partial clones with content filtering and automatic blob recovery
    • Added API endpoints for querying encrypted blobs and withheld paths
    • Enhanced sync functionality to replicate encrypted content across mirrors

@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@beardthelion, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 7 minutes and 34 seconds. Learn how PR review limits work.

To continue reviewing without waiting, enable usage-based billing in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: dbb2764e-6c45-4bce-b668-0f226104b749

📥 Commits

Reviewing files that changed from the base of the PR and between cc30427 and 48c45bc.

📒 Files selected for processing (9)
  • crates/gitlawb-node/src/api/encrypted.rs
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/arweave.rs
  • crates/gitlawb-node/src/db/mod.rs
  • crates/gitlawb-node/src/encrypted_pin.rs
  • crates/gitlawb-node/src/git/visibility_pack.rs
  • crates/gitlawb-node/src/ipfs_pin.rs
  • crates/gitlawb-node/src/pinata.rs
  • crates/gitlawb-node/src/sync.rs
📝 Walkthrough

Walkthrough

This PR implements end-to-end encryption and withholding for Phase 3 confidentiality: withheld blobs are sealed to recipient keys via ephemeral X25519 envelopes, encrypted with XChaCha20Poly1305, pinned to IPFS with opaque recipient tags, anchored to Arweave, replicated across mirrors using promisor-based filtering, and recovered locally via a new gl clone command with node and Arweave fallback paths.

Changes

Core encryption and visibility foundations

Layer / File(s) Summary
Envelope encryption with X25519 key agreement
crates/gitlawb-core/src/encrypt.rs, crates/gitlawb-core/Cargo.toml
New seal_blob/open_blob module implements multi-recipient envelope encryption: seal_blob generates random content key, encrypts plaintext with XChaCha20Poly1305, wraps content key per recipient via ephemeral X25519 and crypto_box, and outputs magic+version+header length+blinded JSON header+encrypted body. open_blob validates envelope structure, derives X25519 secret from keypair seed, attempts AEAD decryption per recipient entry, and decrypts body. Tests cover key agreement, round-trips, tampering, header blinding, and nonce-length protection. Dependencies: curve25519-dalek, crypto_box, chacha20poly1305.
Keypair seed exposure and module wiring
crates/gitlawb-core/src/identity.rs, crates/gitlawb-core/src/lib.rs, crates/gitlawb-node/Cargo.toml
Keypair::seed_bytes() returns raw 32-byte Ed25519 seed for X25519 secret derivation. Encrypt module exported in crate root. Node workspace adds ed25519-dalek dependency.
Visibility-aware blob OID and recipient computation
crates/gitlawb-node/src/git/visibility_pack.rs
New module enumerates reachable blobs via git ls-tree -r per ref, computes withheld OID sets by evaluating visibility rules at each blob path (fail-closed on ref-walk errors), filters object lists for replication, and derives per-blob recipient DIDs as owner_did + reader DIDs allowed at any path. Comprehensive tests cover anonymous/owner/reader contexts, recipient computation, seal/open round-trips, and nonstandard ref namespaces.
Sparse checkout glob helpers and announce gate logic
crates/gitlawb-node/src/visibility.rs
New withheld_globs and reincluded_globs functions compute glob patterns for sparse-checkout exclusion/re-inclusion. Tests validate announce-gate logic (anonymous read of "/" gates replication announcement).

Server-side filtered pack serving and encrypted blob management

Layer / File(s) Summary
Filtered pack generation and git protocol v0 framing
crates/gitlawb-node/src/git/smart_http.rs
New build_filtered_pack shells out to rev-list/pack-objects to exclude withheld OIDs. upload_pack_excluding serves filtered packs in blocking task with protocol v0 NAK framing, optional side-band-64k band-1 wrapping, and comprehensive unit/integration tests using local Axum server, real git clone --filter=blob:none, and incremental fetch scenarios.
Upload-pack visibility enforcement in repos.rs
crates/gitlawb-node/src/api/repos.rs (partial)
git_upload_pack computes withheld blob OIDs via spawn_blocking, routes to normal or filtered pack handler, and remaps errors (protocol issues → BadRequest, git failures → Git).
Encrypt-and-pin with recipient derivation and DB recording
crates/gitlawb-node/src/encrypted_pin.rs, crates/gitlawb-node/src/main.rs
New module computes opaque HMAC-SHA256 recipients_tag (node-keyed, order-insensitive); skips re-sealing when tag matches DB record; otherwise resolves recipient DIDs to keys, reads blob bytes, seals via core encrypt module, pins envelope to IPFS, and records (oid, cid, tag). Best-effort per-blob error handling with logging. Tests validate tag properties and keying.
IPFS and Pinata object filtering and CID retrieval
crates/gitlawb-node/src/ipfs_pin.rs, crates/gitlawb-node/src/pinata.rs
Both modules now accept withheld: &HashSet<String> parameter, filter enumerated objects through replicable_objects, and update list_all_objects to use git rev-list --objects --all instead of batch-all-objects (only reachable objects). IPFS adds new cat function to fetch CID bytes. Tests assert unreachable blobs excluded.
Per-push encrypted manifest anchoring to Arweave
crates/gitlawb-node/src/arweave.rs
New EncryptedManifest struct and anchor_encrypted_manifest function build JSON payload with repo/owner/node/timestamp metadata and blob entries (oid, cid only; no recipients), send to Irys endpoint, return transaction id. No-ops when irys_url empty or blob list empty. Tests mock successful anchoring and validate missing recipient fields.
Receive-pack Phase 2 replication enforcement and propagation gating
crates/gitlawb-node/src/api/repos.rs (main change)
New replication logic in git_receive_pack determines announce eligibility and optional withheld set from visibility rules. When announcable, conditionally spawns IPFS pinning of new objects, calls encrypt_and_pin, and attempts manifest anchoring (all best-effort). Downstream gossipsub and HTTP peer sync notifications are suppressed unless announce true. Arweave permanent anchoring gated by announce && !irys_url.is_empty().
Database migrations and encrypted blob query/write methods
crates/gitlawb-node/src/db/mod.rs
Schema v4 adds encrypted_blobs(repo_id, oid) with cid and recipients column plus index. Schema v5 drops recipients column, adds recipients_tag with non-null default. Db impl adds record_encrypted_blob upsert, list_all_encrypted_blobs, encrypted_blob_cid, and encrypted_blob_recipients_tag read methods.
REST API endpoints for encrypted blob operations
crates/gitlawb-node/src/api/encrypted.rs, crates/gitlawb-node/src/api/mod.rs, crates/gitlawb-node/src/api/visibility.rs
New handlers: list_encrypted_blobs returns {oid, cid} pairs visible to caller (visibility-gated); get_encrypted_blob fetches encrypted envelope bytes from IPFS; replicate_encrypted_blobs returns replication-format JSON (no recipients); withheld_paths returns withheld and reinclude globs (whole-repo visibility gate, no reader_dids). Unit tests validate JSON schema.
HTTP router registration for encrypted blob and visibility endpoints
crates/gitlawb-node/src/server.rs
New GET routes under /api/v1/repos/{owner}/{repo}/ for withheld-paths, encrypted-blobs list/get/replicate wired to visibility and encrypted handlers.

Mirror mode classification and encrypted blob replication

Layer / File(s) Summary
Mirror mode selection and withheld-paths fetching
crates/gitlawb-node/src/sync.rs (Part 1)
Introduces MirrorMode enum (Plain vs Promisor), fetch_withheld HTTP query to origin /withheld-paths (returns None on non-success/network/parse), and classify_mirror to select strategy. Also adds ReplicaBlob/ReplicateResponse types and blobs_needing_replication to decide which envelopes need (re)replication. Reusable reqwest::Client with 30s timeout passed through batch processing.
Encrypted blob replication for mirrors (Option B2)
crates/gitlawb-node/src/sync.rs (Part 2)
New replicate_encrypted_blobs function fetches blob listings from origin, compares against local DB state by OID, pulls missing/changed envelopes via IPFS cat, validates CID match, and records rows keyed by mirror repo_id. Best-effort per-blob with logging; no-ops when IPFS unconfigured, endpoint absent, or no blobs returned.
Promisor clone/fetch with mode transitions
crates/gitlawb-node/src/sync.rs (Part 3)
Updates clone_repo and fetch_repo to accept MirrorMode, apply --filter=blob:limit=10g and promisor/partialclonefilter config for promisor mode, and detect/unset promisor settings on transition back to plain mode (refetch with --refetch --prune). Helpers: git_run (stderr on error), git_run_lenient (ignore failures), git_config_get for promisor detection. Tests cover mode classification, replication decision, and promisor↔plain transitions.
Sync worker loop integration and client construction
crates/gitlawb-node/src/sync.rs (Part 4)
Per-item sync calls fetch_withheld, classifies mirror mode, routes to clone/fetch variants, then calls replicate_encrypted_blobs. Signature changes to process_batch accept client parameter.

Client-side clone command with recovery paths

Layer / File(s) Summary
CLI parsing, partial clone, and sparse checkout configuration
crates/gl/src/clone.rs, crates/gl/src/main.rs
New CloneArgs struct. setup_partial_clone clones with --filter=blob:none --no-checkout, writes non-cone sparse-checkout excluding withheld globs and re-including nested allowed paths, performs branch checkout (defaults to origin/HEAD). parse_repo normalizes gitlawb:/// and bare / formats (supports DID owners with colons). Main adds Clone subcommand routing.
Node-based encrypted blob recovery via HTTP
crates/gl/src/clone.rs (Part 1)
fetch_withheld queries node /withheld-paths, selecting signed vs unsigned endpoint based on local key, returning empty globs on 404/501. recover_encrypted_blobs fetches /encrypted-blobs list and per-blob envelopes, decrypts via open_blob, installs via hash-object -w, verifies OID match, records repo-relative paths. Best-effort per-blob error handling.
Arweave/IPFS fallback recovery with manifest discovery
crates/gl/src/clone.rs (Part 2)
Types: ManifestBlob, Manifest for deserialization. parse_tx_ids extracts Arweave GraphQL transaction ids. recover_from_arweave discovers encrypted manifest transactions via GraphQL, fetches/parses each manifest, merges by latest RFC3339 timestamp, then for each missing object (guarded by GIT_NO_LAZY_FETCH) downloads envelope from IPFS CID endpoint, decrypts with caller key, installs/verifies blob OIDs. Best-effort per-blob; skips on gateway/parse/timeout failures.
Clone flow orchestration and sparse-checkout materialization
crates/gl/src/clone.rs (Part 3)
run() function validates destination non-existence, fetches withheld/reinclude globs, performs partial clone setup, conditionally runs node-based recovery then Arweave fallback (when keypair available), updates sparse-checkout to re-materialize recovered paths for sparse clones, attempts checkout, prints messages. Comprehensive tests cover sparse patterns, response parsing, manifest merging, and end-to-end Arweave/IPFS scenarios with mocked gateways.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

  • Gitlawb/node#42: Raised Phase 3 withholding issues with incomplete withheld sets from ref-namespace scope (heads/tags only) and fail-open ref traversal that could leak cleartext blobs; this PR's visibility_pack.rs implementation enumerates blobs only from heads/tags and skips ref traversal errors silently, directly matching the issue's failure modes.
  • Gitlawb/node#41: Calls for hardening Keypair::seed_bytes() to return a zeroizing wrapper; this PR exposes the method for X25519 derivation but does not yet zeroize the returned bytes, leaving the seed in memory.

Possibly related PRs

  • Gitlawb/node#25: Introduced path-scoped visibility rules and visibility_check logic; this PR extends that work by adding withheld_globs/reincluded_globs helpers and a new /withheld-paths endpoint that gate encrypted blob withholding and replication.

Suggested reviewers

  • kevincodex1

Poem

🐰 Blobs wrapped in X25519 envelopes bright,
Withheld by the rules, hidden from sight,
Pinned secret to IPFS, sealed tight,
Arweave anchors guide the recovery flight,
Clone sparse and recover—all's right! 🔐

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main fix: closing under-withholding by improving ref scope coverage and using reachable-only pin sets.
Linked Issues check ✅ Passed All coding requirements from issue #42 are met: full-ref enumeration replaces heads/tags-only walk in visibility_pack.rs, ref-walk failures now fail closed, and pin enumerators are aligned with reachable graph via git rev-list.
Out of Scope Changes check ✅ Passed All changes directly support fixing the incomplete withheld set vulnerability. Encryption/decryption support and new APIs are necessary infrastructure for the Phase 2/3 replication and withholding system being deployed.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (4)
crates/gl/src/clone.rs (1)

262-267: ⚡ Quick win

Parse /encrypted-blobs into a typed response instead of Value.

A missing or mistyped blobs field currently degrades to [] and silently disables node-side recovery. Making this schema-strict would catch server/client contract drift the same way WithheldPathsResponse already does.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gl/src/clone.rs` around lines 262 - 267, Replace the loose
serde_json::Value parsing for resp in clone.rs with a strongly typed
EncryptedBlobsResponse struct so schema mismatches surface as errors: define
EncryptedBlobsResponse { blobs: Vec<EncryptedBlobOrString> } (matching the
server shape, analogous to WithheldPathsResponse), then call
resp.json::<EncryptedBlobsResponse>().await.context("parsing encrypted-blobs")?
and use the returned .blobs directly instead of
body.get(...).and_then(...).cloned().unwrap_or_default(); this ensures missing
or mistyped "blobs" no longer silently becomes [] and will fail fast (update the
EncryptedBlobOrString type to the actual blob element type used).
crates/gitlawb-node/src/api/encrypted.rs (1)

109-123: ⚡ Quick win

Add a handler test for the 404/404 access-control contract.

The new test only locks the wire shape. A small handler-level test that exercises both “caller denied” and “OID missing” on get_encrypted_blob would protect the deliberate non-oracle behavior from future refactors.

Based on learnings, this endpoint should keep “not found” and “not authorized” externally indistinguishable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/encrypted.rs` around lines 109 - 123, Add a
handler-level test for the access-control contract of get_encrypted_blob that
verifies the external response is indistinguishable when the caller is denied
versus when the OID is missing: write two tests (or one parameterized test)
invoking the get_encrypted_blob handler — one scenario where the auth check
fails (simulate "caller denied") and one where the blob store returns NotFound
(simulate "OID missing") — and assert both responses have the same HTTP status
and body (e.g., both return 404/not-found) to ensure the endpoint does not leak
authorization state; reference the get_encrypted_blob handler in your test and
reuse existing test scaffolding in the encrypted.rs tests module.

Source: Learnings

crates/gitlawb-core/src/encrypt.rs (1)

66-88: ⚡ Quick win

Zeroize transient key material in seal_blob and open_blob.

content_key, the seed-derived X25519 secret, and the unwrapped content key all live in plain stack/heap buffers until drop. In this confidentiality path that is avoidable exposure; keep them in Zeroizing (or explicitly call zeroize()) once the ciphers are initialized.

Hardening sketch
+use zeroize::Zeroizing;
+
-    let mut content_key = [0u8; 32];
-    OsRng.fill_bytes(&mut content_key);
+    let mut content_key = Zeroizing::new([0u8; 32]);
+    OsRng.fill_bytes(&mut *content_key);
     let body_cipher = XChaCha20Poly1305::new_from_slice(&content_key)
         .map_err(|e| anyhow::anyhow!("content key: {e}"))?;
...
-    let my_x = XSecret::from(x25519_secret_from_seed(&keypair.seed_bytes()));
+    let seed = keypair.to_seed();
+    let my_x = XSecret::from(x25519_secret_from_seed(&seed));
...
-    let content_key = content_key.context("not a recipient of this envelope")?;
+    let content_key = Zeroizing::new(content_key.context("not a recipient of this envelope")?);

Also applies to: 126-167

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-core/src/encrypt.rs` around lines 66 - 88, In seal_blob and
open_blob, sensitive key material (the 32-byte content_key, the ephemeral
XSecret `eph`, the seed-derived X25519 secret, and any unwrapped content key
buffers) must be zeroed after use: wrap stack/heap buffers in zeroize::Zeroizing
(e.g., Zeroizing<[u8;32]>, Zeroizing<Vec<u8>>) or explicitly call zeroize() on
types that implement Zeroize immediately after the ciphers are initialized and
after encryption/decryption completes; ensure you zero `content_key`, the
ephemeral secret `eph` (or call its zeroize method), the seed-derived X25519
secret produced by x25519_private/related variable, and the unwrapped content
key before returning or pushing into Recipient so no plaintext key remains in
memory.
crates/gitlawb-node/src/git/visibility_pack.rs (1)

101-130: 🏗️ Heavy lift

Avoid re-traversing the repo to derive recipients.

withheld_blob_recipients() does one full blob_paths() sweep through withheld_blob_oids() and then a second full blob_paths() sweep for the recipient pass. On the push path, git_receive_pack() has already computed the withheld set before it gets here, so repos with visibility rules now pay up to three complete ls-tree traversals per push. Threading the precomputed withheld/blob-path data into this API would keep recipient derivation single-pass.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/git/visibility_pack.rs` around lines 101 - 130,
withheld_blob_recipients currently re-traverses the repo by calling
withheld_blob_oids(...) and then blob_paths(...), causing multiple ls-tree
sweeps; change withheld_blob_recipients signature to accept precomputed withheld
data (e.g., Option<&HashSet<String>> or Option<&HashMap<String,String>>
representing withheld oids or oid->path map) so callers like git_receive_pack
can pass the already-computed set/map, skip the internal withheld_blob_oids call
when provided, and iterate the provided oid->path mapping (or filter blob_paths
once) to build recipients; update callers (notably git_receive_pack) to supply
the precomputed data and keep behavior identical when None is passed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-core/src/identity.rs`:
- Around line 55-59: The new seed_bytes() method leaks a non-zeroizing copy of
the Ed25519 seed; replace its usage with the existing to_seed() which returns a
Zeroizing<[u8;32]>, or change seed_bytes() to return a zeroizing wrapper (e.g.
Zeroizing<[u8;32]>) instead of plain [u8;32]; update callers such as the code
paths that call seed_bytes() (including open_blob and the encrypted-pin task in
crates/gitlawb-node/src/api/repos.rs) to accept and use the Zeroizing seed or to
call to_seed() directly so the raw seed is never held in plain memory.

In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 398-425: Short-circuit the expensive withheld walk by detecting
when the repo's rules cannot withhold any path-scoped blobs (e.g., rules is
empty or only applies to root) before calling
visibility_pack::withheld_blob_oids; if that check shows no possible withheld
blobs, skip the tokio::task::spawn_blocking(...) and call
smart_http::upload_pack(&disk_path, body).await directly, otherwise proceed to
spawn_blocking and then choose between smart_http::upload_pack and
smart_http::upload_pack_excluding based on the resulting withheld set; use the
existing variables and functions (rules, visibility_pack::withheld_blob_oids,
withheld, smart_http::upload_pack, smart_http::upload_pack_excluding) to
implement the early return.

In `@crates/gitlawb-node/src/encrypted_pin.rs`:
- Around line 77-80: Replace silent `continue` branches when
`crate::git::store::read_object(repo_path, oid)` and the IPFS pin operation fail
with warning logs that include contextual details (repo_path, oid, and the
error/result) so failures are observable; in the match arms around `let data =
match crate::git::store::read_object(repo_path, oid) { ... }` and the subsequent
IPFS pin match, call the module’s logger (e.g., process_logger.warn or
log::warn) with a concise message and the error/result, then continue as before
to preserve behavior.
- Around line 72-76: The current code builds `keys: Vec<VerifyingKey>` by
`filter_map(|d| did_to_key(d))`, which silently drops unresolved DIDs and
proceeds to encrypt to a partial recipient set; change this so any unresolved
DID causes the blob to be skipped: instead of `filter_map`, resolve all DIDs and
detect failures (e.g., collect Options or compare `keys.len()` to `dids.len()`
after mapping via `did_to_key`), and when a mismatch is found log a warning
referencing `oid` (similar to the existing log) and `continue` so you do not
seal to a partial set; ensure you keep the `VerifyingKey` handling consistent
with downstream code that expects `keys`.

In `@crates/gitlawb-node/src/ipfs_pin.rs`:
- Around line 75-85: The cat function creates a reqwest::Client with no timeout
(reqwest::Client::new()) which can hang get_encrypted_blob and the sync worker;
modify cat (or callers) to use a client with a bounded timeout (e.g., build a
Client via reqwest::Client::builder().timeout(Duration::from_secs(...)) and
reuse it) or apply a per-request timeout (e.g., wrap the send/bytes await with
tokio::time::timeout) so the POST to {}/api/v0/cat?arg={} fails fast on
slow/stalled Kubo; update the cat signature or module to accept/reuse a Client
or enforce a timeout around resp = client.post(&url).send().await and
resp.bytes().await to avoid indefinite hangs.

In `@crates/gitlawb-node/src/sync.rs`:
- Around line 181-188: The code currently treats a None result from
fetch_withheld as MirrorMode::Plain which incorrectly downgrades unknown/error
states; instead, check the Option returned by fetch_withheld and only set
MirrorMode::Plain when you get Some(empty); if fetch_withheld returns None
preserve the repository's existing promisor/partialclone state (do not call
classify_mirror on None). Concretely, change the branch around
fetch_withheld/classify_mirror to: if withheld.is_some() { let mode =
classify_mirror(withheld); } else { read the current repo promisor/partialclone
setting from the local repo at local_path (or otherwise keep the current
MirrorMode) and use that mode for fetch_repo/clone_repo }, so
fetch_repo/clone_repo are invoked with the preserved mode rather than
unconditionally using MirrorMode::Plain.

In `@crates/gl/src/clone.rs`:
- Around line 428-459: The current GraphQL fetch (the local `query` +
`gql_body`, use of `client.post(...).json(&gql_body).send().await`, and
subsequent `parse_tx_ids(&gql)`) only requests first:100 and stops, causing
missing manifests; change the logic to paginate: rewrite the GraphQL query to
take variables for "first" and "after" (cursor), then loop POSTing the query
(building `gql_body` with `{ "repo": slug, "first": batch_size, "after": cursor
}`), call `parse_tx_ids` (or adjust it to return edges + pageInfo) to collect tx
ids across pages, update `cursor` from pageInfo.nextCursor/hasNextPage until no
more pages, then proceed to fetch manifests and call
`merge_manifests(manifests)` as before; ensure the loop uses the same
`client.post(...).json(&gql_body).send().await` error handling as existing code
and retains the existing behavior when responses fail.
- Around line 415-416: The slug normalization and Arweave discovery/parsing need
tightening: ensure owner short-form uses the same split logic as the node by
keeping owner_short = owner.split(':').next_back().unwrap_or(owner) and forming
slug = format!("{owner_short}/{name}"); update the GraphQL discovery for
encrypted-manifest anchors in discover/recovery code to paginate instead of
hard-coding first:100 (iterate queries until no more edges or timestamps
covered) so recover_encrypted_blobs sees all anchors; change
recover_encrypted_blobs to deserialize the /encrypted-blobs response into a
strict typed struct (like the existing withheld-paths type) and fail on
missing/wrong fields instead of using Value with unwrap_or_default(), ensuring
schema-strict parsing and surfaced errors.

---

Nitpick comments:
In `@crates/gitlawb-core/src/encrypt.rs`:
- Around line 66-88: In seal_blob and open_blob, sensitive key material (the
32-byte content_key, the ephemeral XSecret `eph`, the seed-derived X25519
secret, and any unwrapped content key buffers) must be zeroed after use: wrap
stack/heap buffers in zeroize::Zeroizing (e.g., Zeroizing<[u8;32]>,
Zeroizing<Vec<u8>>) or explicitly call zeroize() on types that implement Zeroize
immediately after the ciphers are initialized and after encryption/decryption
completes; ensure you zero `content_key`, the ephemeral secret `eph` (or call
its zeroize method), the seed-derived X25519 secret produced by
x25519_private/related variable, and the unwrapped content key before returning
or pushing into Recipient so no plaintext key remains in memory.

In `@crates/gitlawb-node/src/api/encrypted.rs`:
- Around line 109-123: Add a handler-level test for the access-control contract
of get_encrypted_blob that verifies the external response is indistinguishable
when the caller is denied versus when the OID is missing: write two tests (or
one parameterized test) invoking the get_encrypted_blob handler — one scenario
where the auth check fails (simulate "caller denied") and one where the blob
store returns NotFound (simulate "OID missing") — and assert both responses have
the same HTTP status and body (e.g., both return 404/not-found) to ensure the
endpoint does not leak authorization state; reference the get_encrypted_blob
handler in your test and reuse existing test scaffolding in the encrypted.rs
tests module.

In `@crates/gitlawb-node/src/git/visibility_pack.rs`:
- Around line 101-130: withheld_blob_recipients currently re-traverses the repo
by calling withheld_blob_oids(...) and then blob_paths(...), causing multiple
ls-tree sweeps; change withheld_blob_recipients signature to accept precomputed
withheld data (e.g., Option<&HashSet<String>> or Option<&HashMap<String,String>>
representing withheld oids or oid->path map) so callers like git_receive_pack
can pass the already-computed set/map, skip the internal withheld_blob_oids call
when provided, and iterate the provided oid->path mapping (or filter blob_paths
once) to build recipients; update callers (notably git_receive_pack) to supply
the precomputed data and keep behavior identical when None is passed.

In `@crates/gl/src/clone.rs`:
- Around line 262-267: Replace the loose serde_json::Value parsing for resp in
clone.rs with a strongly typed EncryptedBlobsResponse struct so schema
mismatches surface as errors: define EncryptedBlobsResponse { blobs:
Vec<EncryptedBlobOrString> } (matching the server shape, analogous to
WithheldPathsResponse), then call
resp.json::<EncryptedBlobsResponse>().await.context("parsing encrypted-blobs")?
and use the returned .blobs directly instead of
body.get(...).and_then(...).cloned().unwrap_or_default(); this ensures missing
or mistyped "blobs" no longer silently becomes [] and will fail fast (update the
EncryptedBlobOrString type to the actual blob element type used).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: fab81d81-dc64-46fe-b486-709131c8a108

📥 Commits

Reviewing files that changed from the base of the PR and between a2217bf and cc30427.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (23)
  • crates/gitlawb-core/Cargo.toml
  • crates/gitlawb-core/src/encrypt.rs
  • crates/gitlawb-core/src/identity.rs
  • crates/gitlawb-core/src/lib.rs
  • crates/gitlawb-node/Cargo.toml
  • crates/gitlawb-node/src/api/encrypted.rs
  • crates/gitlawb-node/src/api/mod.rs
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/api/visibility.rs
  • crates/gitlawb-node/src/arweave.rs
  • crates/gitlawb-node/src/db/mod.rs
  • crates/gitlawb-node/src/encrypted_pin.rs
  • crates/gitlawb-node/src/git/mod.rs
  • crates/gitlawb-node/src/git/smart_http.rs
  • crates/gitlawb-node/src/git/visibility_pack.rs
  • crates/gitlawb-node/src/ipfs_pin.rs
  • crates/gitlawb-node/src/main.rs
  • crates/gitlawb-node/src/pinata.rs
  • crates/gitlawb-node/src/server.rs
  • crates/gitlawb-node/src/sync.rs
  • crates/gitlawb-node/src/visibility.rs
  • crates/gl/src/clone.rs
  • crates/gl/src/main.rs

Comment thread crates/gitlawb-core/src/identity.rs
Comment thread crates/gitlawb-node/src/api/repos.rs
Comment thread crates/gitlawb-node/src/encrypted_pin.rs Outdated
Comment thread crates/gitlawb-node/src/encrypted_pin.rs
Comment thread crates/gitlawb-node/src/ipfs_pin.rs
Comment thread crates/gitlawb-node/src/sync.rs
Comment thread crates/gl/src/clone.rs
Comment thread crates/gl/src/clone.rs
@beardthelion

Copy link
Copy Markdown
Collaborator Author

CodeRabbit's pass here spans the whole unmerged stack (this PR is base=main, so it re-reviews all prereq commits). None of the eight findings fall on this PR's actual delta, which only touches blob_paths, ipfs_pin::list_all_objects, and pinata::list_all_objects. They're routed to where they belong rather than folded in here:

@kevincodex1

Copy link
Copy Markdown
Contributor

hello @beardthelion do you want to be a maintainer of this node repository? I would like to add you as one. let me know if you are interested.

@beardthelion

Copy link
Copy Markdown
Collaborator Author

Thanks @kevincodex1, I'd be glad to. I've enjoyed working on this and I'm happy to take on more responsibility for it.

Couple of things so I know what I'm signing up for: what's the expectation for the role, just merge access, or also triage/releases and general upkeep? And who else maintains it currently?

Either way, count me in.

@kevincodex1

Copy link
Copy Markdown
Contributor

please rebase and fix conflicts

@beardthelion beardthelion force-pushed the fix/withholding-ref-scope branch 2 times, most recently from 90dc8d4 to 67afb20 Compare June 18, 2026 17:59
…adability

The origin no longer stores recipient DIDs. Migration v5 replaces the
encrypted_blobs.recipients column with an opaque, node-keyed recipients_tag
used only to detect a recipient-set change for re-seal. B1 discovery and fetch
are now gated by the same repo-readability check the git read path uses, not by
per-recipient matching; decryption is gated by the envelope crypto, so a
non-recipient who can read the repo sees a blob's {oid, cid} but cannot open it.
encrypt_and_pin keys the tag from the node seed and returns {oid, cid}; the
Arweave manifest tuple drops the now-unused recipient vec.

A DB compromise no longer reveals the reader set; recovering it would require
brute-forcing candidate DID sets against the keyed tag with the node key.
…rror

Address review on the at-rest blinding change:
- The encrypted-blobs/replicate listing returned {oid, cid} with no visibility
  check, so a non-readable repo's blob index was reachable by an unauthenticated
  caller who guessed {owner}/{repo}. Gate it by the same repo-readability check
  discovery and fetch use. For the intended case (a public repo with withheld
  subtrees) the public root keeps this open to peers; only fully non-readable
  repos are withheld, which is the desired behavior.
- encrypt_and_pin treated a recipients_tag DB read error as a cache miss and
  resealed, causing avoidable IPFS writes during a partial outage; skip and retry
  on the next push instead.
- Correct the get_encrypted_blob doc comment to describe repo-readability access.
…nly pin set

blob_paths walked only refs/heads/* and refs/tags/* and skipped silently on a
failed git ls-tree, so a blob reachable only through another namespace, or a ref
that failed to traverse, could fall out of the withheld set and ship in cleartext.
Walk every ref and fail closed on traversal error.

The pin enumerators (ipfs_pin, pinata) used git cat-file --batch-all-objects,
which includes unreachable/dangling objects that have no path and cannot be
classified for withholding. Switch them to git rev-list --objects --all so the
pin set matches the reachable graph blob_paths evaluates.
@beardthelion beardthelion force-pushed the fix/withholding-ref-scope branch from 67afb20 to cd5cda4 Compare June 19, 2026 15:49
…tate

Two correctness fixes flagged in review on this branch.

encrypted_pin: encrypt_and_pin built its recipient key list with
filter_map, silently dropping any DID that does not resolve to an inline
key (did:web / did:gitlawb) and sealing to the remaining subset. Because
recipients_tag is computed over the full DID set, the blob would never be
re-sealed once the dropped DID later resolved, permanently locking that
authorized reader out. resolve_recipient_keys now returns None if any DID
is unresolvable, and the blob is skipped (and retried next push) rather
than sealed to a partial set.

sync: fetch_withheld returns None on any 404/5xx/network/parse error, and
classify_mirror collapsed None into MirrorMode::Plain. On an existing
promisor mirror that downgrade strips the partial-clone config and
--refetches a still-withheld repo, which fails, so a transient
withheld-paths outage broke syncs until the endpoint recovered. decide_mode
now preserves an existing promisor mirror's mode when the lookup is unknown;
fresh-clone fail-closed behavior is unchanged.
@beardthelion

Copy link
Copy Markdown
Collaborator Author

Rebased onto main and conflicts resolved. Branch fast-forwards cleanly on the current main (tip 48c45bc), no conflicts remaining. Ready for review/merge.

@beardthelion beardthelion added kind:bug Defect fix — wrong or unsafe behavior crate:node gitlawb-node — the serving node and REST API subsystem:visibility Path-scoped visibility and content withholding sev:high Major break or real security/trust risk, no easy workaround labels Jun 22, 2026
@beardthelion

Copy link
Copy Markdown
Collaborator Author

Closing in favor of the same-repo PRs this work has been decomposed into, now that the fork-based setup isn't needed:

Nothing is dropped. The #42 work is re-homed in #84, rebased onto current main and with an added HEAD-scope hardening so the withheld set is a superset of both the pin and serve paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crate:node gitlawb-node — the serving node and REST API kind:bug Defect fix — wrong or unsafe behavior sev:high Major break or real security/trust risk, no easy workaround subsystem:visibility Path-scoped visibility and content withholding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase 3 withholding: incomplete withheld set can leak blobs in cleartext (ref namespace scope + fail-open ref walk)

2 participants