fix(node): gate repo-listing and stats surfaces on visibility (#97, #99, #101, #104)#111
fix(node): gate repo-listing and stats surfaces on visibility (#97, #99, #101, #104)#111beardthelion wants to merge 9 commits into
Conversation
…thholding (#101) A deny rule authored in NFC byte-compared unequal to a path committed in NFD (or vice versa), so blob_paths/withheld_blob_oids failed to classify the blob and it shipped in cleartext on the replication/pack path. Normalize both the glob prefix and the candidate path to NFC at the single matcher seam (glob_matches + specificity); NFC not NFKC so compatibility forms are not folded into distinct paths. Live serve is unaffected (git resolves byte-exact; the gate only becomes more restrictive). Adds a matcher unit test and a replication-path integration test with the variant byte inside the rule-covered directory name.
…b leak (#99) The withheld set is computed over reachable objects (rev-list --all + HEAD), but the full-scan pin fallback (cat-file --batch-all-objects) includes dangling objects the reachable walk never classified. Subtracting the reachable-only withheld set therefore could not catch a private blob orphaned by a force-push/amend, which pinned to IPFS/Pinata in cleartext. Invert the default for blobs on the full-scan path: a candidate replicates only if it is a non-blob (commits/trees are structural) or a reachable, visibility-allowed blob; anything else (a dangling blob, or a withheld one) is dropped. The delta path keeps the cheaper reachable-only subtraction since delta candidates are always reachable, so no extra walk on the hot path. Adds list_all_objects_with_type/all_blob_oids (typed enumeration), replicable_blob_set + replicable_objects_fail_closed, and threads a full_scan flag through resolve_candidates_for_push. Pin functions now take a pre-filtered object list. Regression tests: dangling blob excluded end-to-end, fail-closed unit semantics, and the all-blob universe includes dangling + excludes non-blobs.
…epo enumeration (#97) list_repos, list_federated_repos, and the GraphQL repos query returned every repo row with no visibility check, so an unauthenticated caller could enumerate private repos' name, owner, description, and clone URL — and X-Total-Count leaked the private-repo count even when rows were filtered. Each surface now keeps a row only when visibility_check at "/" allows the caller, the same decision the per-repo content endpoints make — not a bare is_public filter, which would diverge for is_public=false-with-root-reader and is_public=true-with-root-deny. The "/" decision depends on owner short/full-DID matching and JSON reader-DID membership, so it is applied in Rust over an over-fetched set rather than pushed into SQL; the paged path derives X-Total-Count from the filtered set (dropping the SQL window count) and paginates after filtering, so neither the page nor the count leaks. Adds a batch root-rule query (list_visibility_rules_for_repos) to avoid N+1, and converts list_all_repos_paged into list_all_repos_deduped (mirror dedup preserved, pagination moved to Rust). Tests: anonymous hides private repo + count, owner sees own private repo, authorized root-reader sees an is_public=false repo (proving the visibility_check path, not is_public), and federated listing hides private from anonymous.
…in helper, add listing tests Code-review follow-up on the VIS-subtree fixes (no behavior change to the three shipped fixes; all confirmed sound by review): - DRY the '/' listing gate into one shared visibility::listable_at_root used by list_repos, list_federated_repos, and the GraphQL repos query, so the three surfaces enforce one rule instead of three drifting copies (enforce-shared-rule-on-every-reader-path). - Extract the full-scan fail-closed pin block into fail_closed_full_scan_objects, mirroring replication_withheld_set's degraded-path shape; drop the duplicated comment and the per-site no_rules sentinel (use &[]). - Add the three listing tests the review flagged as missing: anonymous GraphQL repos hides a private repo, the paged X-Total-Count path excludes the private count (the KTD2 exploit shape), and an is_public=true repo under a root deny is not listed (proves the gate is visibility_check, not a bare is_public filter).
…able_at_root Round-2 review follow-up. replication_withheld_set still made the announce decision with an inline visibility_check(.., "/") == Allow — the fourth copy of the listing gate the seam refactor was meant to collapse. Route it through crate::visibility::listable_at_root (caller is always None, the anonymous replication audience), so every reader of the '/' decision now shares one seam. Also drop an em dash from the fail_closed_full_scan_objects doc comment.
…lose the count oracle (#104) The stats endpoint returned count_repos_deduped(): an unfiltered, unauthenticated count of every logical repo including is_public=false and mode-A hide-existence repos. A polling observer could detect that a hidden repo was created (N to N+1), the exact existence signal mode-A suppresses everywhere else. Count only the repos an anonymous caller could list: over-fetch the deduped set, batch-load visibility rules, and keep rows that pass listable_at_root(.., None), mirroring the listing seam in api/repos.rs. The caller is always None (meta_routes carries no auth layer). Fail closed: any DB error collapses the count to 0, since an under-count cannot leak existence. count_repos_deduped is retained as the tested reference for the DEDUP_CTE dedup count (its tests pin the CASE the live list paths share); it has no production caller after this change and is allowed dead outside tests. Tests: bare-private (no-rule branch), hide-existence mode-A with both repos is_public=true (the Some(rule) branch), public-under-root-deny, list-parity, sibling-fields-preserved, and empty-DB.
Three gaps the review flagged as uncovered: - owner-filtered listing (?owner=) still hides a private repo and its count from anonymous (a distinct SQL bind branch from the unfiltered path). - offset past the visible set returns an empty page while X-Total-Count stays the full visible total (guards against deriving the total from the cut page). - a logical repo present as both a canonical row (with a root-deny rule) and a gossip mirror row stays hidden: pins that the DEDUP_CTE picks the canonical survivor so the batch rule lookup finds the deny. If dedup ever picked the mirror (slash-form id, no rule) the gate would fall back to is_public and leak.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughRepo listing, stats, GraphQL, federation, and replication now use root-level visibility checks with bulk-loaded rules. Push candidate resolution carries a full-scan flag, replication filters fail closed, and visibility matching now normalizes NFC/NFD path forms. ChangesVisibility and replication enforcement
Sequence Diagram(s)sequenceDiagram
participant Caller
participant API
participant Db
participant listable_at_root
Caller->>API: request repos or stats
API->>Db: fetch repos and visibility rules
loop each repo
API->>listable_at_root: is_public, owner, caller, rules
listable_at_root-->>API: visible or hidden
end
API-->>Caller: filtered results and visible count
sequenceDiagram
participant git_receive_pack
participant resolve_candidates_for_push
participant fail_closed_full_scan_objects
participant replicable_objects_fail_closed
participant pin_new_objects
git_receive_pack->>resolve_candidates_for_push: new_tips, old_tips
resolve_candidates_for_push-->>git_receive_pack: PinCandidateSet
alt full_scan
git_receive_pack->>fail_closed_full_scan_objects: repo_path, rules
fail_closed_full_scan_objects-->>git_receive_pack: object_list
else delta
git_receive_pack->>replicable_objects_fail_closed: candidates, withheld set
replicable_objects_fail_closed-->>git_receive_pack: object_list
end
git_receive_pack->>pin_new_objects: object_list
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
crates/gitlawb-node/src/test_support.rs (2)
795-808: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick winAssert the federated
countfield too.This test only checks
repos, but/api/v1/repos/federatedalso exposescount. If that field regresses to the pre-filter total, private repos are still enumerable even though the array is filtered.Suggested assertion
let body = json_body(resp).await; let names = names_in(&body["repos"]); + assert_eq!( + body["count"].as_u64(), + Some(1), + "`count` must reflect only the visible federated repos" + ); assert!( names.contains(&"pub-repo".to_string()), "public repo federated" );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/test_support.rs` around lines 795 - 808, The federated repos test in test_support::anon_get("/api/v1/repos/federated") only validates the filtered repos array, so extend it to also assert the response count field matches the visible public repos. Update the existing test near names_in/json_body to check body["count"] reflects the post-filter total and does not include private repos, using the same router/anon_get flow already in place.
828-833: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winAdd a positive-path GraphQL visibility test.
graphql_schema.execute(Request::new(...))only exercises the anonymous context. The resolver pulls the caller DID from GraphQL request data, so this file still needs an owner/authorized-reader case to catch auth-context regressions on the GraphQL surface.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/test_support.rs` around lines 828 - 833, The current GraphQL test only covers the anonymous request path, so add a positive-path visibility case that exercises the authenticated context used by the resolver. Update the existing test support around graphql_schema.execute(Request::new(...)) to also send request data containing the caller DID for an owner or authorized reader, and assert the repos query still returns the expected names. Use the GraphQL resolver/context plumbing in this file to locate the relevant test setup and make sure the auth-context path is covered alongside the anonymous case.crates/gitlawb-node/src/server.rs (1)
463-470: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winAvoid the star-count join in stats.
/api/v1/statsonly counts visible repos, solist_all_repos_deduped()avoids therepo_starsaggregation while preserving the same dedup seam.♻️ Proposed refactor
- let rows = state.db.list_all_repos_deduped_with_stars(None).await?; - let ids: Vec<String> = rows.iter().map(|(r, _)| r.id.clone()).collect(); + let rows = state.db.list_all_repos_deduped().await?; + let ids: Vec<String> = rows.iter().map(|r| r.id.clone()).collect(); let rules_by_repo = state.db.list_visibility_rules_for_repos(&ids).await?; let count = rows .iter() - .filter(|(r, _)| { + .filter(|r| { let rules = rules_by_repo.get(&r.id).map(Vec::as_slice).unwrap_or(&[]); crate::visibility::listable_at_root(rules, r.is_public, &r.owner_did, None) })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/server.rs` around lines 463 - 470, The stats handler is still using the stars-joined repo listing, which is unnecessary for /api/v1/stats since it only counts visible repos. Update the logic in server::stats to use list_all_repos_deduped instead of list_all_repos_deduped_with_stars, and keep the existing visibility filtering flow with list_visibility_rules_for_repos and listable_at_root so the dedup behavior stays the same without the repo_stars aggregation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/git/visibility_pack.rs`:
- Around line 1031-1035: The NFC/NFD test currently hardcodes a 40-character
object ID check, which makes it fail for SHA-256 repos. Update the assertion in
visibility_pack’s NFC/NFD test to use the same hash-length guard as the nearby
fixture logic (accepting 40 or 64) and keep the existing check that the
NFD-named blob actually exists, using the relevant secret_oid and fixture
validation paths to locate the change.
---
Nitpick comments:
In `@crates/gitlawb-node/src/server.rs`:
- Around line 463-470: The stats handler is still using the stars-joined repo
listing, which is unnecessary for /api/v1/stats since it only counts visible
repos. Update the logic in server::stats to use list_all_repos_deduped instead
of list_all_repos_deduped_with_stars, and keep the existing visibility filtering
flow with list_visibility_rules_for_repos and listable_at_root so the dedup
behavior stays the same without the repo_stars aggregation.
In `@crates/gitlawb-node/src/test_support.rs`:
- Around line 795-808: The federated repos test in
test_support::anon_get("/api/v1/repos/federated") only validates the filtered
repos array, so extend it to also assert the response count field matches the
visible public repos. Update the existing test near names_in/json_body to check
body["count"] reflects the post-filter total and does not include private repos,
using the same router/anon_get flow already in place.
- Around line 828-833: The current GraphQL test only covers the anonymous
request path, so add a positive-path visibility case that exercises the
authenticated context used by the resolver. Update the existing test support
around graphql_schema.execute(Request::new(...)) to also send request data
containing the caller DID for an owner or authorized reader, and assert the
repos query still returns the expected names. Use the GraphQL resolver/context
plumbing in this file to locate the relevant test setup and make sure the
auth-context path is covered alongside the anonymous case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ebacf008-a9bc-4e8b-8f33-bbc7e4286cef
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
crates/gitlawb-node/Cargo.tomlcrates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/db/mod.rscrates/gitlawb-node/src/git/push_delta.rscrates/gitlawb-node/src/git/visibility_pack.rscrates/gitlawb-node/src/graphql/query.rscrates/gitlawb-node/src/ipfs_pin.rscrates/gitlawb-node/src/pinata.rscrates/gitlawb-node/src/server.rscrates/gitlawb-node/src/test_support.rscrates/gitlawb-node/src/visibility.rs
jatmn
left a comment
There was a problem hiding this comment.
[P2] Keep full-DID owner filters matching mirror-only repos
crates/gitlawb-node/src/db/mod.rs:937
The shared listing CTE still filters owners with owner_did = $1 OR owner_did LIKE '%:' || $1, so ?owner=did:key:z... does not match a mirror-only row stored with the bare owner z.... Before this change, the no-limit /api/v1/repos?owner=... path fetched all rows and filtered with crate::api::did_matches, so full-DID and short owner forms both returned the mirror row; now every owner-filtered listing goes through this SQL filter, including the compatibility no-limit path used by gl repo list. Please normalize the bound owner the same way as the DEDUP owner key or otherwise make the SQL predicate match did_matches, and add a regression with a bare-owner mirror row queried via the full did:key: form.
- visibility_pack NFC/NFD test: accept a SHA-1 or SHA-256 object id (40 | 64) instead of hardcoding 40, matching the fixture guard later in the file, so the test is not SHA-1-only. - federated listing test: also assert the response `count` reflects only the visible repos, guarding against a regression to the pre-filter total. - add an authenticated-caller GraphQL test: an owner sees their own private repo via `repos`, covering the resolver's caller-DID path that the anonymous-only test did not exercise. - stats: count via the no-stars `list_all_repos_deduped()` (same DEDUP_CTE) instead of the stars-joined variant, since the count never needs stars.
…rows (#111) Addresses jatmn's P2. The shared listing CTE filtered owners with `owner_did = $1 OR owner_did LIKE '%:' || $1`, so a full `did:key:z...` query did not match a mirror-only row stored with the bare owner `z...`. #97 routed the no-limit owner-filtered path (used by `gl repo list`) through this SQL predicate, regressing the full-DID-against-bare-mirror direction that `crate::api::did_matches` handled in the old Rust path. Normalize the bound owner to its did:key short form and compare it against the row's owner key using the same CASE the dedup grouping already applies, so full and short forms match symmetrically and a non-key DID still matches only exactly. The no-arg `list_all_repos_deduped()` binds NULL and is unaffected. Regression test: a bare-owner mirror-only row is returned when queried by both the full `did:key:` form and the bare short form. Confirmed RED (test fails) before the fix.
|
@jatmn good catch, and confirmed it was a regression from the #97 rework. Routing the no-limit owner filter through the CTE predicate Fixed in c1636d8: the bound owner is normalized to its did:key short form and compared against the row's owner key via the same CASE the dedup already groups on, so both sides compare on one normalized key. Full and short forms now match symmetrically, a non-key DID still matches only exactly, and the no-arg Added the regression you asked for: a mirror-only row stored with the bare owner is returned when queried by both Re-requesting your review. |
jatmn
left a comment
There was a problem hiding this comment.
No findings here
@kevincodex1 LGTM
Summary
Gates every repo-listing surface and the stats count on per-caller visibility, closing four private-repo enumeration / leak bugs (#97, #99, #101, #104). This is the held visibility-hardening stack rebased onto current
mainnow that #84 has merged; the redundant #84 lineage is dropped and the listing fixes are re-derived on top ofmain's did:key-aware dedup.Motivation & context
Closes #97
Closes #99
Closes #101
Closes #104
Listing and aggregate surfaces returned repos (and counts) without applying the per-caller
"/"visibility decision, so an unauthorized caller could enumerate private repos or infer their existence from a count. Each fix routes its surface through the samevisibility_check-at-"/"predicate the per-repo content endpoints already use.Kind of change
What changed
All in
gitlawb-node:list_repos(REST),list_federated_repos, and the GraphQLreposquery now keep only repos the caller may read at"/"via a sharedcrate::visibility::listable_at_rootseam. Each surface keeps both the existing mirror/canonical dedup and the new visibility filter. The REST path over-fetches the deduped set, filters in Rust, then paginates, derivingX-Total-Countfrom the visible set so neither the page nor the count leaks a private repo. Adds a batchedlist_visibility_rules_for_reposto avoid an N+1./api/v1/statsreposis now the count of anonymously-listable repos instead of the raw deduped total, so a poller can no longer detect a hidden repo's creation. Fails closed to 0 on any DB error.replicable_objects_fail_closed), so a dangling private blob is never replicated.How a reviewer can verify
cargo test --workspace cargo clippy --workspace --all-targets -- -D warnings cargo fmt --all -- --checkThe visibility behavior is covered end to end by
#[sqlx::test]HTTP-API tests incrates/gitlawb-node/src/test_support.rs: private repos and their counts are withheld from anonymous callers across REST (unfiltered, owner-filtered, and paged), GraphQL, and federated listings; owners and authorized root-readers still see their repos;is_public=true-under-root-deny andis_public=false-with-root-reader both behave; and the stats count excludes private/mode-A repos.#99/#101have unit coverage ingit/visibility_pack.rs.Before you request review
cargo test --workspacepasses locallycargo fmt --allandcargo clippy --workspace --all-targets -- -D warningsare cleanfeat(...),fix(...),docs(...)).env.exampleupdated if behavior or config changed (or N/A)Notes for reviewers
Intended behavioral changes that follow from gating on visibility:
X-Total-Countandstats.reposnow reflect what the caller can see rather than every row, the GraphQLreposresult set is caller-filtered, and an anonymous peer-sync fetch returns only public repos (private-repo federation now requires authenticated peer sync).count_repos_dedupedloses its production caller and is kept test-only as the tested reference for the dedup CASE.A couple of pre-existing, out-of-scope cleanups surfaced during review and are left as follow-ups: the federated handler still dedups in Rust while the other surfaces use the SQL CTE, and the per-peer federated fetch timeout only bounds the connect/send phase.
Summary by CodeRabbit
New Features
/api/v1/statsnow reports a visibility-aware repository count.Bug Fixes