Skip to content

fix(node): skip withheld-walk when no path-scoped rule can withhold#60

Open
beardthelion wants to merge 4 commits into
mainfrom
fix/withheld-walk-short-circuit
Open

fix(node): skip withheld-walk when no path-scoped rule can withhold#60
beardthelion wants to merge 4 commits into
mainfrom
fix/withheld-walk-short-circuit

Conversation

@beardthelion

@beardthelion beardthelion commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Public clones and fetches were paying for an authorization walk they could never benefit from. On every clone/fetch the serve path enumerated every blob path across every ref (a blocking git ls-tree -r per ref) to compute which blobs to withhold, then threw the result away and served a normal pack whenever nothing was withheld, which is always the case for a repo with no path-scoped visibility rules, i.e. the common public repo. The cost got worse with #42, which widened the walk from heads/tags to every ref.

This skips the walk entirely when no rule can withhold anything.

What changed

A shared predicate, has_path_scoped_rule, now gates all three consumers of the walk:

  1. The serve path (git_upload_pack) skips the blocking walk and serves the pack directly.
  2. The post-push replication path's existing empty-rules short-circuit is generalized to also cover root-only rule sets.
  3. The post-push encrypt-then-pin path in git_receive_pack is gated on the same predicate, replacing a !rules.is_empty() check that still ran the per-ref walk (via withheld_blob_recipients) for root-only rule sets.

Routing all three through one predicate keeps the call sites from drifting.

Why it's safe

The short-circuit is only sound because the whole-repo / access gate already runs before every call site: a denying root rule 404s the caller on the serve path, and suppresses announcement on the replication and encrypt-then-pin paths, before the predicate is ever consulted. Any caller that reaches the predicate has therefore already cleared repo-level access, and with no path-scoped rule left there is nothing to withhold. Served-pack contents are unchanged on the success path.

The encrypt-then-pin gate is additionally behavior-preserving: for a root-only rule set withheld_blob_recipients returns an empty recipient map, so encrypt_and_pin was already a no-op; the gate just skips the wasted walk that produced it.

The predicate's doc comment records this precondition explicitly, plus the dependency on validate_path_glob rejecting /** (the only non-/ glob that would match every path).

One intended behavioural difference, on the error path only: a corrupt public/root-only repo whose ref walk would have thrown now skips that walk and goes straight to upload_pack, so a spurious 500 originating in the walk no longer surfaces there. This is the wasted-walk overhead going away, not a change to what content is served.

Tests

  • Predicate unit tests (empty / single-root / path-scoped / mixed / multi-root).
  • A gate-aware safety invariant test: for any caller that has passed the / gate, an empty or root-only rule set yields an empty withheld set, pinning the invariant the skip depends on against future changes to the matching logic.
  • A serve-decision test over a real bare repo covering all three predicate branches: root-only skips the walk and serves the full pack, a reader allowed at / but denied /secret/** is not served the secret blob, and the owner bypasses the rule and is served everything. It composes the same functions git_upload_pack composes; the pack-byte filtering itself is covered by filtered_serve_excludes_withheld_blob in smart_http.rs.
  • Full gitlawb-node suite: 199 passing; cargo fmt --check and cargo clippy clean.

The serve/replication HTTP endpoints have no end-to-end git smart-http harness in this crate, so the call-site wiring is covered at the predicate and serve-decision level rather than via a full request roundtrip.

The adjacent fork-path concern (a fork clones a full mirror after only a root / check) is pre-existing and out of scope here; it is tracked separately in #98.

Closes #50.

Summary by CodeRabbit

  • Bug Fixes

    • Corrected visibility rule handling for repository replication and adjusted withholding behavior for configurations using only whole-repository rules.
  • Performance Improvements

    • Improved pack-serving efficiency by avoiding unnecessary per-object withholding checks when no path-scoped visibility rules are present.
  • Tests

    • Added unit tests for detecting path-scoped visibility rules, covering empty, root-only (allow/deny), mixed, and duplicate rule scenarios, plus safety checks ensuring withheld results remain empty when appropriate.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds has_path_scoped_rule for non-root visibility rules, uses it to skip withheld-blob work in pack serving, and updates replication enforcement to avoid recipient computation when no path-scoped rule exists.

Changes

Short-circuit withheld-blob walk for non-path-scoped rules

Layer / File(s) Summary
Path-scoped predicate and tests
crates/gitlawb-node/src/git/visibility_pack.rs
Adds has_path_scoped_rule and tests for empty, root-only, mixed, and duplicate-root rule sets, plus a withheld_blob_oids safety case.
Upload-pack and replication enforcement
crates/gitlawb-node/src/api/repos.rs
Uses has_path_scoped_rule to skip withheld-OID computation in git_upload_pack, to return an empty withheld set in replication_withheld_set, and to gate withheld recipient computation in replication enforcement.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Gitlawb/node#28: Introduced the original withheld-blob walk and upload_pack_excluding path that this PR now conditionally skips.
  • Gitlawb/node#34: Changes the same replication withholding flow that now uses has_path_scoped_rule as a gate.
  • Gitlawb/node#40: Touches the same encrypt-then-pin withheld-blob path in git_receive_pack.

Suggested reviewers

  • kevincodex1

Poem

🐇 I sniffed the rules along the path,
And skipped the walk to save the wrath.
Root-only rules? I hop right through,
Withheld blobs vanish from the view.
The pack streams on, both quick and bright,
While secret paths stay tucked from sight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the main behavior change: skipping the withheld walk when no path-scoped rule can withhold.
Linked Issues check ✅ Passed The PR fulfills #50 by short-circuiting the serve-path withheld walk when no path-scoped rule exists, matching the issue's acceptance criteria.
Out of Scope Changes check ✅ Passed The added replication-path gate and tests are aligned with the same optimization and do not appear unrelated to the requested change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description is substantive and covers summary, changes, safety, and tests, though some template sections like kind of change and verification steps are omitted.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/withheld-walk-short-circuit

Comment @coderabbitai help to get the list of available commands.

@beardthelion beardthelion added kind:perf Performance or allocation work, no behavior change crate:node gitlawb-node — the serving node and REST API subsystem:visibility Path-scoped visibility and content withholding sev:low Cosmetic, cleanup, or nice-to-have labels Jun 22, 2026

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR adds a safe short-circuit that skips the expensive per-ref withheld-blob walk when visibility rules contain only whole-repo / entries, and wires that predicate into both fetch serving and post-push replication.

Findings

No findings.

Please rebase on main and clear merge conflict.

…t-circuit

Shared predicate (rules.iter().any(path_glob != "/")) both serve and
replication paths will use to skip the per-blob withheld walk when no rule
can withhold. Doc records the "/"-gate precondition and the validate_path_glob
dependency. Adds unit tests plus a gate-aware safety invariant test.
)

The serve path (git_upload_pack) ran the full per-ref withheld_blob_oids
walk on every clone/fetch before falling back to a plain upload_pack, even
for repos with no path-scoped rules where the walk can never withhold
anything. Cost grew with #42 (per-ref git ls-tree -r).

Guard both call sites on has_path_scoped_rule: the serve path skips the
spawn_blocking walk entirely and serves upload_pack directly; the
replication path generalizes its is_empty() short-circuit to also cover
root-only rule sets. Safe because the whole-repo "/" gate runs before both
sites, so a denying root rule 404s/withholds upstream and never reaches the
predicate. No change to served-pack contents on the success path.
@beardthelion beardthelion force-pushed the fix/withheld-walk-short-circuit branch from 75acc57 to 3f8bf0b Compare June 24, 2026 13:28

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 841-846: The recipient-computation branch still runs for root-only
rules because `rules_for_enc.filter(|r| !r.is_empty())` is too broad, so update
the logic in `repos.rs` to gate the `withheld_blob_recipients` path using
`has_path_scoped_rule` instead. Keep the existing `rules_for_enc` handling, but
only spawn the blocking recipient walk when a path-scoped rule is present;
otherwise skip the encrypted-recipient computation entirely to avoid the
ref-wide withheld walk.
- Around line 1222-1225: The fork flow in authorize_repo_read is only checking
root visibility before later performing a full mirror clone, which can expose
path-scoped content to callers who can read “/” but not all blobs. Update the
repo-fork path around authorize_repo_read and the downstream git clone --mirror
flow to either fail closed for non-owners when full mirror access is not
authorized, or build the fork from the same filtered projection used by
git_upload_pack instead of cloning the entire repository. Preserve
not-found-style behavior for protected-data access failures, and ensure any
owner-only bypass uses did_matches(...) for DID checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bc197409-41a0-44cd-bd8a-3cd966cd3652

📥 Commits

Reviewing files that changed from the base of the PR and between 75acc57 and 3f8bf0b.

📒 Files selected for processing (2)
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/git/visibility_pack.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/gitlawb-node/src/git/visibility_pack.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 841-846: The recipient-computation branch still runs for root-only
rules because `rules_for_enc.filter(|r| !r.is_empty())` is too broad, so update
the logic in `repos.rs` to gate the `withheld_blob_recipients` path using
`has_path_scoped_rule` instead. Keep the existing `rules_for_enc` handling, but
only spawn the blocking recipient walk when a path-scoped rule is present;
otherwise skip the encrypted-recipient computation entirely to avoid the
ref-wide withheld walk.
- Around line 1222-1225: The fork flow in authorize_repo_read is only checking
root visibility before later performing a full mirror clone, which can expose
path-scoped content to callers who can read “/” but not all blobs. Update the
repo-fork path around authorize_repo_read and the downstream git clone --mirror
flow to either fail closed for non-owners when full mirror access is not
authorized, or build the fork from the same filtered projection used by
git_upload_pack instead of cloning the entire repository. Preserve
not-found-style behavior for protected-data access failures, and ensure any
owner-only bypass uses did_matches(...) for DID checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bc197409-41a0-44cd-bd8a-3cd966cd3652

📥 Commits

Reviewing files that changed from the base of the PR and between 75acc57 and 3f8bf0b.

📒 Files selected for processing (2)
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/git/visibility_pack.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/gitlawb-node/src/git/visibility_pack.rs
🛑 Comments failed to post (2)
crates/gitlawb-node/src/api/repos.rs (2)

841-846: 🚀 Performance & Scalability | 🟠 Major | ⚡ Quick win

Use has_path_scoped_rule before computing encrypted recipients.

rules_for_enc.filter(|r| !r.is_empty()) is still true for root-only rules, so withheld_blob_recipients can still run the ref-wide withheld walk this PR is trying to avoid.

Proposed fix
-            if let Some(rules) = rules_for_enc.filter(|r| !r.is_empty()) {
+            if let Some(rules) =
+                rules_for_enc.filter(|r| crate::git::visibility_pack::has_path_scoped_rule(r))
+            {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

            if let Some(rules) =
                rules_for_enc.filter(|r| crate::git::visibility_pack::has_path_scoped_rule(r))
            {
                let p = repo_path_clone.clone();
                let owner = owner_did.clone();
                let recip = tokio::task::spawn_blocking(move || {
                    crate::git::visibility_pack::withheld_blob_recipients(
                        &p, &rules, is_public, &owner,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/repos.rs` around lines 841 - 846, The
recipient-computation branch still runs for root-only rules because
`rules_for_enc.filter(|r| !r.is_empty())` is too broad, so update the logic in
`repos.rs` to gate the `withheld_blob_recipients` path using
`has_path_scoped_rule` instead. Keep the existing `rules_for_enc` handling, but
only spawn the blocking recipient walk when a path-scoped rule is present;
otherwise skip the encrypted-recipient computation entirely to avoid the
ref-wide withheld walk.

1222-1225: 🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

Do not fork a full mirror after only a root visibility check.

Line 1224 authorizes only "/", but the later git clone --mirror copies every object. A non-owner who can read the root but is denied path-scoped blobs can fork the repo to obtain the withheld content, bypassing the filtered git_upload_pack path.

Conservative fail-closed guard
-    let (source, _rules) =
+    let (source, source_rules) =
         crate::api::authorize_repo_read(&state, &owner, &name, Some(auth.0.as_str()), "/").await?;
+    if crate::git::visibility_pack::has_path_scoped_rule(&source_rules)
+        && !crate::api::did_matches(auth.0.as_str(), &source.owner_did)
+    {
+        return Err(AppError::RepoNotFound(format!("{owner}/{name}")));
+    }

If non-owner forks of path-scoped repos should remain supported, build the fork from the same filtered projection instead of cloning the full mirror. Based on learnings, protected-data access failures should preserve not-found-style behavior, and owner DID checks should use did_matches(...).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    // Enforce read visibility on the source before cloning: an unauthorized
    // caller must not be able to fork (full mirror) a repo they cannot read.
    let (source, source_rules) =
        crate::api::authorize_repo_read(&state, &owner, &name, Some(auth.0.as_str()), "/").await?;
    if crate::git::visibility_pack::has_path_scoped_rule(&source_rules)
        && !crate::api::did_matches(auth.0.as_str(), &source.owner_did)
    {
        return Err(AppError::RepoNotFound(format!("{owner}/{name}")));
    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/repos.rs` around lines 1222 - 1225, The fork flow
in authorize_repo_read is only checking root visibility before later performing
a full mirror clone, which can expose path-scoped content to callers who can
read “/” but not all blobs. Update the repo-fork path around authorize_repo_read
and the downstream git clone --mirror flow to either fail closed for non-owners
when full mirror access is not authorized, or build the fork from the same
filtered projection used by git_upload_pack instead of cloning the entire
repository. Preserve not-found-style behavior for protected-data access
failures, and ensure any owner-only bypass uses did_matches(...) for DID checks.

Source: Learnings

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

1. Encrypt-then-pin branch still runs the ref-wide walk for root-only rules

Location: crates/gitlawb-node/src/api/repos.rs, lines 839-846

The post-push encrypt-then-pin path gates the withheld_blob_recipients call with rules_for_enc.filter(|r| !r.is_empty()). That condition is true for any non-empty rule set, including a root-only / rule. withheld_blob_recipients then calls withheld_blob_oids, which performs the same git ls-tree -r per-ref walk this PR is trying to avoid.

This is the third consumer of the ref-wide walk, and it was missed by the PR's short-circuit. The same predicate used for the other two call sites should be applied here:

if let Some(rules) = rules_for_enc
    .filter(|r| crate::git::visibility_pack::has_path_scoped_rule(r))
{
    // ... spawn withheld_blob_recipients ...
}

Impact: Root-only repos continue to pay the per-push blocking walk cost on every push that triggers encrypt-then-pin, partially defeating the PR's performance goal.

Suggested action: Gate the withheld_blob_recipients block with has_path_scoped_rule, mirroring the other two call sites.


2. Fork endpoint clones a full mirror after only a root visibility check

Location: crates/gitlawb-node/src/api/repos.rs, lines 1216-1273

fork_repo authorizes the caller only at the whole-repo / path and then runs git clone --mirror of the source repository. A caller who is allowed at / but denied a path-scoped subtree (e.g., /secret/**) can therefore fork the entire repo and obtain blobs that the filtered git_upload_pack path would have withheld.

This is a pre-existing issue, but it is directly adjacent to the visibility logic this PR is optimizing. CodeRabbit flagged it as a critical security concern in the most recent review.

Impact: Non-owner callers with root access but without subtree access can bypass path-scoped visibility by forking.

Suggested action: Either fail closed for non-owners when the source has path-scoped rules, or build the fork from the same filtered projection used by git_upload_pack rather than from a full mirror. Preserve the existing not-found-style behavior for protected-data failures and use did_matches(...) for any owner-only bypass.

The post-push encrypt-then-pin block in git_receive_pack gated the
withheld_blob_recipients call on `!r.is_empty()`, which is true for a
root-only `/` rule set. withheld_blob_recipients then runs the same
per-ref `git ls-tree` walk (withheld_blob_oids) the short-circuit is
meant to avoid, and returns an empty recipient map anyway. Gate it on
has_path_scoped_rule to match the serve and replication sites; the skip
is behavior-preserving (empty map -> encrypt_and_pin is a no-op).

Also align the two remaining call sites to the visibility_pack import
alias for consistency.
… branches

Drives the git_upload_pack serve decision over a real bare repo for the
INV-2 caller (a reader allowed at whole-repo "/" but denied a path-scoped
subtree), exercising both branches the predicate selects: root-only rules
skip the walk and serve the full pack (asserting the walk would withhold
nothing, so the skip is sound), and a /secret/** rule runs the walk and
excludes the secret blob from the served set while keeping the public one.
@beardthelion

Copy link
Copy Markdown
Collaborator Author

Both addressed.

#1 — encrypt-then-pin ran the ref-wide walk for root-only rules. Fixed in cf81708: the block now gates on has_path_scoped_rule, matching the serve and replication sites. It's behavior-preserving for the root-only case (the walk would return an empty recipient map, so encrypt_and_pin was already a no-op) and unchanged for path-scoped repos.

#2 — fork clones a full mirror after only a root check. Pre-existing and a separate authz concern, so I split it to #98 rather than fold it in here. Tracked, not dropped.

Also added a serve-decision test (1d5589b) over a real bare repo covering all three predicate branches: root-only skips the walk and serves the full pack, a reader denied /secret/** is not served the secret blob, and the owner bypasses the rule and gets everything. One coverage boundary worth naming: it composes the same functions git_upload_pack composes rather than driving the handler through a git smart-http roundtrip (no such harness on this branch); the pack-byte filtering itself is already covered by filtered_serve_excludes_withheld_blob in smart_http.rs.

@beardthelion beardthelion requested a review from jatmn June 24, 2026 16:35

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I do not see any actionable issues from my review.

@kevincodex1 LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crate:node gitlawb-node — the serving node and REST API kind:perf Performance or allocation work, no behavior change sev:low Cosmetic, cleanup, or nice-to-have subsystem:visibility Path-scoped visibility and content withholding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Serve path: short-circuit withheld walk when no path-scoped rule can withhold

2 participants