fix(node): skip withheld-walk when no path-scoped rule can withhold#60
fix(node): skip withheld-walk when no path-scoped rule can withhold#60beardthelion wants to merge 4 commits into
Conversation
📝 WalkthroughWalkthroughAdds ChangesShort-circuit withheld-blob walk for non-path-scoped rules
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
jatmn
left a comment
There was a problem hiding this comment.
Summary
This PR adds a safe short-circuit that skips the expensive per-ref withheld-blob walk when visibility rules contain only whole-repo / entries, and wires that predicate into both fetch serving and post-push replication.
Findings
No findings.
Please rebase on main and clear merge conflict.
…t-circuit Shared predicate (rules.iter().any(path_glob != "/")) both serve and replication paths will use to skip the per-blob withheld walk when no rule can withhold. Doc records the "/"-gate precondition and the validate_path_glob dependency. Adds unit tests plus a gate-aware safety invariant test.
) The serve path (git_upload_pack) ran the full per-ref withheld_blob_oids walk on every clone/fetch before falling back to a plain upload_pack, even for repos with no path-scoped rules where the walk can never withhold anything. Cost grew with #42 (per-ref git ls-tree -r). Guard both call sites on has_path_scoped_rule: the serve path skips the spawn_blocking walk entirely and serves upload_pack directly; the replication path generalizes its is_empty() short-circuit to also cover root-only rule sets. Safe because the whole-repo "/" gate runs before both sites, so a denying root rule 404s/withholds upstream and never reaches the predicate. No change to served-pack contents on the success path.
75acc57 to
3f8bf0b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 841-846: The recipient-computation branch still runs for root-only
rules because `rules_for_enc.filter(|r| !r.is_empty())` is too broad, so update
the logic in `repos.rs` to gate the `withheld_blob_recipients` path using
`has_path_scoped_rule` instead. Keep the existing `rules_for_enc` handling, but
only spawn the blocking recipient walk when a path-scoped rule is present;
otherwise skip the encrypted-recipient computation entirely to avoid the
ref-wide withheld walk.
- Around line 1222-1225: The fork flow in authorize_repo_read is only checking
root visibility before later performing a full mirror clone, which can expose
path-scoped content to callers who can read “/” but not all blobs. Update the
repo-fork path around authorize_repo_read and the downstream git clone --mirror
flow to either fail closed for non-owners when full mirror access is not
authorized, or build the fork from the same filtered projection used by
git_upload_pack instead of cloning the entire repository. Preserve
not-found-style behavior for protected-data access failures, and ensure any
owner-only bypass uses did_matches(...) for DID checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bc197409-41a0-44cd-bd8a-3cd966cd3652
📒 Files selected for processing (2)
crates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/git/visibility_pack.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/gitlawb-node/src/git/visibility_pack.rs
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 841-846: The recipient-computation branch still runs for root-only
rules because `rules_for_enc.filter(|r| !r.is_empty())` is too broad, so update
the logic in `repos.rs` to gate the `withheld_blob_recipients` path using
`has_path_scoped_rule` instead. Keep the existing `rules_for_enc` handling, but
only spawn the blocking recipient walk when a path-scoped rule is present;
otherwise skip the encrypted-recipient computation entirely to avoid the
ref-wide withheld walk.
- Around line 1222-1225: The fork flow in authorize_repo_read is only checking
root visibility before later performing a full mirror clone, which can expose
path-scoped content to callers who can read “/” but not all blobs. Update the
repo-fork path around authorize_repo_read and the downstream git clone --mirror
flow to either fail closed for non-owners when full mirror access is not
authorized, or build the fork from the same filtered projection used by
git_upload_pack instead of cloning the entire repository. Preserve
not-found-style behavior for protected-data access failures, and ensure any
owner-only bypass uses did_matches(...) for DID checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bc197409-41a0-44cd-bd8a-3cd966cd3652
📒 Files selected for processing (2)
crates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/git/visibility_pack.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/gitlawb-node/src/git/visibility_pack.rs
🛑 Comments failed to post (2)
crates/gitlawb-node/src/api/repos.rs (2)
841-846: 🚀 Performance & Scalability | 🟠 Major | ⚡ Quick win
Use
has_path_scoped_rulebefore computing encrypted recipients.
rules_for_enc.filter(|r| !r.is_empty())is still true for root-only rules, sowithheld_blob_recipientscan still run the ref-wide withheld walk this PR is trying to avoid.Proposed fix
- if let Some(rules) = rules_for_enc.filter(|r| !r.is_empty()) { + if let Some(rules) = + rules_for_enc.filter(|r| crate::git::visibility_pack::has_path_scoped_rule(r)) + {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.if let Some(rules) = rules_for_enc.filter(|r| crate::git::visibility_pack::has_path_scoped_rule(r)) { let p = repo_path_clone.clone(); let owner = owner_did.clone(); let recip = tokio::task::spawn_blocking(move || { crate::git::visibility_pack::withheld_blob_recipients( &p, &rules, is_public, &owner,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/repos.rs` around lines 841 - 846, The recipient-computation branch still runs for root-only rules because `rules_for_enc.filter(|r| !r.is_empty())` is too broad, so update the logic in `repos.rs` to gate the `withheld_blob_recipients` path using `has_path_scoped_rule` instead. Keep the existing `rules_for_enc` handling, but only spawn the blocking recipient walk when a path-scoped rule is present; otherwise skip the encrypted-recipient computation entirely to avoid the ref-wide withheld walk.
1222-1225: 🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win
Do not fork a full mirror after only a root visibility check.
Line 1224 authorizes only
"/", but the latergit clone --mirrorcopies every object. A non-owner who can read the root but is denied path-scoped blobs can fork the repo to obtain the withheld content, bypassing the filteredgit_upload_packpath.Conservative fail-closed guard
- let (source, _rules) = + let (source, source_rules) = crate::api::authorize_repo_read(&state, &owner, &name, Some(auth.0.as_str()), "/").await?; + if crate::git::visibility_pack::has_path_scoped_rule(&source_rules) + && !crate::api::did_matches(auth.0.as_str(), &source.owner_did) + { + return Err(AppError::RepoNotFound(format!("{owner}/{name}"))); + }If non-owner forks of path-scoped repos should remain supported, build the fork from the same filtered projection instead of cloning the full mirror. Based on learnings, protected-data access failures should preserve not-found-style behavior, and owner DID checks should use
did_matches(...).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.// Enforce read visibility on the source before cloning: an unauthorized // caller must not be able to fork (full mirror) a repo they cannot read. let (source, source_rules) = crate::api::authorize_repo_read(&state, &owner, &name, Some(auth.0.as_str()), "/").await?; if crate::git::visibility_pack::has_path_scoped_rule(&source_rules) && !crate::api::did_matches(auth.0.as_str(), &source.owner_did) { return Err(AppError::RepoNotFound(format!("{owner}/{name}"))); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/repos.rs` around lines 1222 - 1225, The fork flow in authorize_repo_read is only checking root visibility before later performing a full mirror clone, which can expose path-scoped content to callers who can read “/” but not all blobs. Update the repo-fork path around authorize_repo_read and the downstream git clone --mirror flow to either fail closed for non-owners when full mirror access is not authorized, or build the fork from the same filtered projection used by git_upload_pack instead of cloning the entire repository. Preserve not-found-style behavior for protected-data access failures, and ensure any owner-only bypass uses did_matches(...) for DID checks.Source: Learnings
jatmn
left a comment
There was a problem hiding this comment.
Findings
1. Encrypt-then-pin branch still runs the ref-wide walk for root-only rules
Location: crates/gitlawb-node/src/api/repos.rs, lines 839-846
The post-push encrypt-then-pin path gates the withheld_blob_recipients call with rules_for_enc.filter(|r| !r.is_empty()). That condition is true for any non-empty rule set, including a root-only / rule. withheld_blob_recipients then calls withheld_blob_oids, which performs the same git ls-tree -r per-ref walk this PR is trying to avoid.
This is the third consumer of the ref-wide walk, and it was missed by the PR's short-circuit. The same predicate used for the other two call sites should be applied here:
if let Some(rules) = rules_for_enc
.filter(|r| crate::git::visibility_pack::has_path_scoped_rule(r))
{
// ... spawn withheld_blob_recipients ...
}Impact: Root-only repos continue to pay the per-push blocking walk cost on every push that triggers encrypt-then-pin, partially defeating the PR's performance goal.
Suggested action: Gate the withheld_blob_recipients block with has_path_scoped_rule, mirroring the other two call sites.
2. Fork endpoint clones a full mirror after only a root visibility check
Location: crates/gitlawb-node/src/api/repos.rs, lines 1216-1273
fork_repo authorizes the caller only at the whole-repo / path and then runs git clone --mirror of the source repository. A caller who is allowed at / but denied a path-scoped subtree (e.g., /secret/**) can therefore fork the entire repo and obtain blobs that the filtered git_upload_pack path would have withheld.
This is a pre-existing issue, but it is directly adjacent to the visibility logic this PR is optimizing. CodeRabbit flagged it as a critical security concern in the most recent review.
Impact: Non-owner callers with root access but without subtree access can bypass path-scoped visibility by forking.
Suggested action: Either fail closed for non-owners when the source has path-scoped rules, or build the fork from the same filtered projection used by git_upload_pack rather than from a full mirror. Preserve the existing not-found-style behavior for protected-data failures and use did_matches(...) for any owner-only bypass.
The post-push encrypt-then-pin block in git_receive_pack gated the withheld_blob_recipients call on `!r.is_empty()`, which is true for a root-only `/` rule set. withheld_blob_recipients then runs the same per-ref `git ls-tree` walk (withheld_blob_oids) the short-circuit is meant to avoid, and returns an empty recipient map anyway. Gate it on has_path_scoped_rule to match the serve and replication sites; the skip is behavior-preserving (empty map -> encrypt_and_pin is a no-op). Also align the two remaining call sites to the visibility_pack import alias for consistency.
… branches Drives the git_upload_pack serve decision over a real bare repo for the INV-2 caller (a reader allowed at whole-repo "/" but denied a path-scoped subtree), exercising both branches the predicate selects: root-only rules skip the walk and serve the full pack (asserting the walk would withhold nothing, so the skip is sound), and a /secret/** rule runs the walk and excludes the secret blob from the served set while keeping the public one.
|
Both addressed. #1 — encrypt-then-pin ran the ref-wide walk for root-only rules. Fixed in cf81708: the block now gates on #2 — fork clones a full mirror after only a root check. Pre-existing and a separate authz concern, so I split it to #98 rather than fold it in here. Tracked, not dropped. Also added a serve-decision test (1d5589b) over a real bare repo covering all three predicate branches: root-only skips the walk and serves the full pack, a reader denied |
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the contribution. I do not see any actionable issues from my review.
@kevincodex1 LGTM
Summary
Public clones and fetches were paying for an authorization walk they could never benefit from. On every clone/fetch the serve path enumerated every blob path across every ref (a blocking
git ls-tree -rper ref) to compute which blobs to withhold, then threw the result away and served a normal pack whenever nothing was withheld, which is always the case for a repo with no path-scoped visibility rules, i.e. the common public repo. The cost got worse with #42, which widened the walk from heads/tags to every ref.This skips the walk entirely when no rule can withhold anything.
What changed
A shared predicate,
has_path_scoped_rule, now gates all three consumers of the walk:git_upload_pack) skips the blocking walk and serves the pack directly.git_receive_packis gated on the same predicate, replacing a!rules.is_empty()check that still ran the per-ref walk (viawithheld_blob_recipients) for root-only rule sets.Routing all three through one predicate keeps the call sites from drifting.
Why it's safe
The short-circuit is only sound because the whole-repo
/access gate already runs before every call site: a denying root rule 404s the caller on the serve path, and suppresses announcement on the replication and encrypt-then-pin paths, before the predicate is ever consulted. Any caller that reaches the predicate has therefore already cleared repo-level access, and with no path-scoped rule left there is nothing to withhold. Served-pack contents are unchanged on the success path.The encrypt-then-pin gate is additionally behavior-preserving: for a root-only rule set
withheld_blob_recipientsreturns an empty recipient map, soencrypt_and_pinwas already a no-op; the gate just skips the wasted walk that produced it.The predicate's doc comment records this precondition explicitly, plus the dependency on
validate_path_globrejecting/**(the only non-/glob that would match every path).One intended behavioural difference, on the error path only: a corrupt public/root-only repo whose ref walk would have thrown now skips that walk and goes straight to
upload_pack, so a spurious 500 originating in the walk no longer surfaces there. This is the wasted-walk overhead going away, not a change to what content is served.Tests
/gate, an empty or root-only rule set yields an empty withheld set, pinning the invariant the skip depends on against future changes to the matching logic./but denied/secret/**is not served the secret blob, and the owner bypasses the rule and is served everything. It composes the same functionsgit_upload_packcomposes; the pack-byte filtering itself is covered byfiltered_serve_excludes_withheld_blobinsmart_http.rs.gitlawb-nodesuite: 199 passing;cargo fmt --checkandcargo clippyclean.The serve/replication HTTP endpoints have no end-to-end git smart-http harness in this crate, so the call-site wiring is covered at the predicate and serve-decision level rather than via a full request roundtrip.
The adjacent fork-path concern (a fork clones a full mirror after only a root
/check) is pre-existing and out of scope here; it is tracked separately in #98.Closes #50.
Summary by CodeRabbit
Bug Fixes
Performance Improvements
Tests