Skip to content

feat(storage): storage-agnostic BlobStore layer + push-latency quick wins#79

Open
kevincodex1 wants to merge 5 commits into
mainfrom
feat/storage-abstraction
Open

feat(storage): storage-agnostic BlobStore layer + push-latency quick wins#79
kevincodex1 wants to merge 5 commits into
mainfrom
feat/storage-abstraction

Conversation

@kevincodex1

@kevincodex1 kevincodex1 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Replace the hardcoded Tigris client with a backend-agnostic object store.

BlobStore trait (get/put/head/delete/list) with backends:

  • s3: any S3-compatible service (Tigris, Cloudflare R2, AWS S3, MinIO, B2)
  • fs: local filesystem (atomic writes; unit-tested)
  • ipfs: Kubo node via MFS (etag = content CID)

RepoArchive composes a bare repo into one repos/v1/{slug}/{repo}.tar.zst object on top of any backend (tar+zstd moved here; git/tigris.rs removed). RepoStore rewired from Option to Option.

Config (backward compatible): GITLAWB_STORAGE_BACKEND, GITLAWB_S3_BUCKET, GITLAWB_S3_ENDPOINT, GITLAWB_S3_FORCE_PATH_STYLE, GITLAWB_STORAGE_FS_DIR, GITLAWB_ASYNC_UPLOAD. GITLAWB_TIGRIS_BUCKET kept as a legacy alias.

Push-latency quick wins:

  • Skip the redundant pre-write download when the cached etag already matches storage (previously up to two full-repo downloads per push).
  • Write-back ack: ack the client before the durable upload completes; the advisory lock is held until upload finishes so cross-node consistency holds.

cargo check clean; storage::fs + repo_store unit tests pass.

Closes #89

Summary by CodeRabbit

Release Notes

  • New Features

    • Added pluggable repository archive storage backends (S3-compatible, local filesystem, IPFS) with automatic selection and configurable S3 endpoint/path-style.
  • Bug Fixes

    • Improved repo synchronization using remote metadata (etags) to skip unnecessary downloads and fail closed when write-path storage verification fails.
    • Updated push/merge/issue lock handling so durable upload failures are surfaced appropriately and uploads are driven only by successful operations (including optional background upload).
  • Tests

    • Added unit tests for etag-based download skipping, corrupt-archive behavior, and filesystem concurrency.
  • Chores

    • Updated configuration and storage initialization; removed the legacy Tigris backend module.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR replaces the hard-coded Tigris storage client with a BlobStore trait and three concrete backends (S3-compatible, filesystem, IPFS). RepoArchive is rewritten on top of BlobStore, and RepoStore gains etag-based staleness detection with sync_down_if_stale and pinned advisory-lock connection management. A new async_upload config flag controls whether the push acknowledgement waits for the durable upload. API handlers now propagate durable upload failures as errors instead of silently ignoring them.

Changes

Pluggable Storage Backend and Etag-Driven Caching

Layer / File(s) Summary
BlobStore trait, ObjectMeta, storage configuration, and key validation
crates/gitlawb-node/Cargo.toml, crates/gitlawb-node/src/storage/mod.rs, crates/gitlawb-node/src/config.rs
Adds ObjectMeta carrying size and optional etag; defines async BlobStore trait for get/put/head/delete/list; implements build() to select backend with explicit precedence over auto-detection; adds validate_key to reject unsafe blob keys; declares storage submodules; adds config fields (storage_backend, s3_bucket, s3_endpoint, s3_force_path_style, storage_fs_dir, async_upload) with auto-detection rules; updates tigris_bucket documentation to mark it as legacy; adds async-trait dependency.
S3-compatible backend
crates/gitlawb-node/src/storage/s3.rs
Adds S3BlobStore implementing BlobStore against aws-sdk-s3 with optional endpoint override and path-style addressing; implements get (returns None on not-found), put (atomic upload with optional ETag), head (metadata without download), delete (idempotent), and paginated list via list_objects_v2.
Filesystem backend with atomic swap and concurrency safety
crates/gitlawb-node/src/storage/fs.rs
Adds FsBlobStore for local-disk storage; uses atomic temp-file + rename for put (in blocking task); derives size-mtime etag from file metadata; implements recursive prefix-filtered list; path-escape defense prevents traversal attacks; includes tests for round-trip, rejected traversal keys, and concurrent puts.
IPFS Kubo MFS backend
crates/gitlawb-node/src/storage/ipfs.rs
Adds IpfsBlobStore using Kubo HTTP API for MFS-backed storage under /gitlawb/<key>; implements get (None on missing-path), put (streaming multipart with optional etag suppression), head (Size + Hash as etag), delete (idempotent), and iterative-DFS list with best-effort subtree traversal.
Backend-agnostic RepoArchive abstraction
crates/gitlawb-node/src/storage/archive.rs
Replaces TigrisClient with RepoArchive backed by Arc<dyn BlobStore>; provides head_etag, exists, upload (spawn-blocking tar.zst + store.put), download (store.get + spawn-blocking decompress), and delete; implements compress_repo (tar + zstd), publish_lock per-destination serialization, and decompress_repo (temp dir unpacking + atomic swap with stale cleanup); includes round-trip, corrupt-archive, and FsBlobStore integration tests.
RepoStore etag-cache and pinned advisory-lock refactor
crates/gitlawb-node/src/git/repo_store.rs, crates/gitlawb-node/src/git/mod.rs, crates/gitlawb-node/src/db/mod.rs
Refactors RepoStore to hold optional RepoArchive, dedicated lock_pool, and shared migrated/versions etag caches; introduces sync_down_if_stale with etag-skip on match and self-heal on read vs fail-closed on write; updates acquire, acquire_fresh, acquire_write (pins SQLx connection for lock/sync/unlock with retry), init, release_after_write, and RepoWriteGuard::release to use archive; removes tigris git submodule and public pool() accessor from Db.
Main startup storage wiring and async_upload post-push control
crates/gitlawb-node/src/main.rs, crates/gitlawb-node/src/api/repos.rs
Adds storage module declaration; replaces Tigris initialization with fail-fast storage::build(config) + RepoArchive; creates dedicated advisory-lock PgPool; updates git_receive_pack to spawn guard.release in background on successful push when async_upload enabled, otherwise awaits it synchronously.
API handlers error handling for durable upload failures
crates/gitlawb-node/src/api/issues.rs, crates/gitlawb-node/src/api/pulls.rs, crates/gitlawb-node/src/api/repos.rs, crates/gitlawb-node/src/state.rs
Updates create_issue, close_issue, merge_pr, and fork_repo to capture and surface durable upload failures from guard.release() and repo_store.release_after_write() as AppError::Git; adds explicit early-return lock releases before error paths in close_issue; updates inline comments to reference generic "storage" instead of "Tigris".

Sequence Diagram(s)

sequenceDiagram
  participant Client as Git Client
  participant Handler as git_receive_pack
  participant Store as RepoStore
  participant Archive as RepoArchive
  participant Blob as BlobStore (S3/FS/IPFS)

  Client->>Handler: HTTP push
  Handler->>Store: acquire_write(key)
  Store->>Store: pg_try_advisory_lock (with retry)
  Store->>Store: sync_down_if_stale(require_fresh=true)
  Store->>Archive: head_etag(owner, repo)
  Archive->>Blob: head(key)
  Blob-->>Archive: ObjectMeta {size, etag}
  alt etag matches cached versions
    Archive-->>Store: etag found, skip download
  else stale or missing
    Archive->>Blob: get(key)
    Blob-->>Archive: tar.zst bytes
    Archive->>Archive: decompress_repo to local_path
    Archive-->>Store: versions.insert(key, etag)
  end
  Store-->>Handler: RepoWriteGuard {conn, archive, versions}
  Handler->>Handler: smart_http::receive_pack
  Handler-->>Client: HTTP response (push acked)
  alt push_ok AND async_upload
    Handler->>Store: tokio::spawn guard.release(true)
    Store->>Archive: upload(owner, repo, local_path)
    Archive->>Blob: put(key, tar.zst)
    Blob-->>Archive: ObjectMeta {etag}
    Store->>Store: versions.insert + pg_advisory_unlock
  else push failed OR sync upload
    Handler->>Store: guard.release(push_ok).await
    Store->>Archive: upload → store.put
    Store->>Store: pg_advisory_unlock
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Gitlawb/node#54: Both PRs update the same advisory-lock/repo-write success gating logic around guard.release(...)/RepoWriteGuard::release(success) in API handlers and RepoStore to avoid propagating bad writes when hydration/updates fail.
  • Gitlawb/node#80: Both PRs touch the same crates/gitlawb-node/src/git/tigris.rs code paths (e.g., download/decompress_repo and publish locking logic), while the main PR ultimately removes the tigris module as part of the storage backend migration.
  • Gitlawb/node#36: Both PRs modify the git_receive_pack push handler in crates/gitlawb-node/src/api/repos.rs, with main PR changing advisory-lock release timing around async_upload and retrieved PR adding replication/encryption enforcement in the same code path.

Suggested labels

kind:perf

Poem

🐇 No more single-store lock-in for me,
Three backends now dance—S3, IPFS, and FS free!
Etags skip stale downloads with glee,
Async uploads let pushes race without me.
Pluggable storage: the key to deploy! 🗝️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely summarizes the main change: introducing a storage-agnostic BlobStore layer and push-latency improvements.
Description check ✅ Passed The PR description comprehensively covers the changes: BlobStore trait with multiple backend implementations, RepoArchive abstraction, backward-compatible configuration, and push-latency optimizations. It aligns well with the required template structure.
Linked Issues check ✅ Passed The PR fully addresses issue #89 acceptance criteria: backend-agnostic BlobStore with S3/fs/IPFS backends [#89], atomic concurrency-safe extraction preserved [#89], backward-compatible config with legacy GITLAWB_TIGRIS_BUCKET honored [#89], and unit tests for storage::fs and repo_store [#89].
Out of Scope Changes check ✅ Passed All changes are directly related to the stated objectives: storage backend abstraction, RepoArchive implementation, RepoStore refactoring, configuration updates, and advisory-lock pool isolation. No unrelated churn detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/storage-abstraction

Comment @coderabbitai help to get the list of available commands.

@github-actions github-actions Bot added the needs-issue PR has no linked issue label Jun 22, 2026
@github-actions

Copy link
Copy Markdown

Thanks for the contribution. A couple of things will help us review this faster:

  • Link the issue this addresses (Closes #123). For protocol changes, open an issue first.

See CONTRIBUTING.md. Update the PR and these notes will clear automatically.

@beardthelion beardthelion added kind:feature New capability or surface crate:node gitlawb-node — the serving node and REST API subsystem:storage Blob/object store, Arweave, IPFS, archives sev:medium Degraded but workaround exists labels Jun 22, 2026
…wins

Replace the hardcoded Tigris client with a backend-agnostic object store.

BlobStore trait (get/put/head/delete/list) with backends:
- s3:   any S3-compatible service (Tigris, Cloudflare R2, AWS S3, MinIO, B2)
- fs:   local filesystem (atomic writes; unit-tested)
- ipfs: Kubo node via MFS (etag = content CID)

RepoArchive composes a bare repo into one repos/v1/{slug}/{repo}.tar.zst
object on top of any backend (tar+zstd moved here; git/tigris.rs removed).
RepoStore rewired from Option<TigrisClient> to Option<RepoArchive>.

Config (backward compatible): GITLAWB_STORAGE_BACKEND, GITLAWB_S3_BUCKET,
GITLAWB_S3_ENDPOINT, GITLAWB_S3_FORCE_PATH_STYLE, GITLAWB_STORAGE_FS_DIR,
GITLAWB_ASYNC_UPLOAD. GITLAWB_TIGRIS_BUCKET kept as a legacy alias.

Push-latency quick wins:
- Skip the redundant pre-write download when the cached etag already matches
  storage (previously up to two full-repo downloads per push).
- Write-back ack: ack the client before the durable upload completes; the
  advisory lock is held until upload finishes so cross-node consistency holds.

cargo check clean; storage::fs + repo_store unit tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kevincodex1 kevincodex1 force-pushed the feat/storage-abstraction branch from fa8a9f3 to 7b7e690 Compare June 23, 2026 08:03

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/storage/archive.rs (1)

162-184: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Clean up the temp dir when the swap phase fails.

The unpack-error path removes tmp_dir, but the swap phase does not: if remove_dir_all(local_path) or rename returns Err, the freshly-extracted .{file_name}.tmp-extract.<uuid> dir is orphaned. Because each extraction uses a fresh UUID and nothing reclaims leftovers, repeated failures (e.g. permission errors, open handles) leak temp dirs indefinitely.

Secondary concern: remove_dir_all(local_path) before rename destroys the only valid local copy first, so a subsequent rename failure leaves the repo missing — undermining sync_down_if_stale's local_path.exists() self-heal fallback.

🧹 Proposed fix: scope cleanup over the whole swap
-    let lock = publish_lock(local_path);
-    let _publish = lock.lock().expect("publish lock poisoned");
-    if local_path.exists() {
-        std::fs::remove_dir_all(local_path).context("removing stale repo dir")?;
-    }
-    std::fs::rename(&tmp_dir, local_path).context("swapping extracted repo into place")?;
-    Ok(())
+    let swap = (|| -> Result<()> {
+        let lock = publish_lock(local_path);
+        let _publish = lock.lock().expect("publish lock poisoned");
+        if local_path.exists() {
+            std::fs::remove_dir_all(local_path).context("removing stale repo dir")?;
+        }
+        std::fs::rename(&tmp_dir, local_path).context("swapping extracted repo into place")?;
+        Ok(())
+    })();
+    if let Err(e) = swap {
+        let _ = std::fs::remove_dir_all(&tmp_dir);
+        return Err(e);
+    }
+    Ok(())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/storage/archive.rs` around lines 162 - 184, The swap
phase in the archive extraction does not clean up tmp_dir if either the
remove_dir_all(local_path) call or the rename call fails, leaving orphaned
temporary directories. Additionally, removing local_path before confirming the
rename succeeds risks losing the only valid copy if rename fails. Ensure that
tmp_dir is cleaned up in all failure paths during the swap phase by wrapping the
entire lock-acquisition and swap logic so that if any error occurs after
acquiring the publish lock, the tmp_dir is removed before returning the error.
Consider the order of operations to minimize data loss risk by only removing the
old local_path after confirming the rename will succeed.
🧹 Nitpick comments (2)
crates/gitlawb-node/src/storage/ipfs.rs (1)

72-73: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Avoid body.to_vec() to prevent doubling upload memory.

Line [72] copies the full blob before upload. For large repo archives this spikes memory usage and allocator pressure; use a streaming multipart part instead.

Suggested fix
-        let part = reqwest::multipart::Part::bytes(body.to_vec()).file_name("blob");
+        let part = reqwest::multipart::Part::stream(reqwest::Body::from(body)).file_name("blob");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/storage/ipfs.rs` around lines 72 - 73, The multipart
Part creation uses body.to_vec() which copies the entire blob into memory before
uploading, doubling memory usage for large files. Replace the
reqwest::multipart::Part::bytes() call with a streaming approach such as
reqwest::multipart::Part::stream() to avoid the unnecessary copy and pass the
body content directly as a stream instead of converting it to a vector first.
crates/gitlawb-node/src/storage/fs.rs (1)

114-117: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Don’t silently skip directory read failures in list().

Lines [114]-[117] convert I/O failures into partial results. Returning Ok(keys) after hidden errors can mislead future admin/GC/migration callers into acting on incomplete inventories.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/storage/fs.rs` around lines 114 - 117, The list()
function silently ignores directory read failures by using an Err(_) => continue
pattern in the std::fs::read_dir(&dir) match statement, which causes it to
return Ok(keys) with incomplete results. Replace this silent failure handling
with proper error propagation or handling that ensures callers are informed of
failures rather than acting on incomplete inventory data. Instead of continuing
on read_dir errors, return an Err result or log and return the appropriate error
indication so that admin, GC, and migration processes receive accurate
information about the success or failure of listing all keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 555-561: The `tokio::spawn(guard.release(true))` call has two
issues: the advisory lock release at the end of the `release()` method silently
discards errors with `let _ =`, making lock-release failures invisible to
operators, and the spawned task is not integrated with graceful shutdown,
potentially leaving locks acquired indefinitely. Fix by adding proper error
logging (via warn! or error!) for the advisory lock release query failure
instead of using `let _ =` to discard the error, and integrate the spawned task
with axum's graceful shutdown mechanism using a task tracker or shutdown signal
to ensure the task completes or is cancelled gracefully during shutdown.

In `@crates/gitlawb-node/src/git/repo_store.rs`:
- Around line 178-192: The cache-miss path currently uses unwrap_or(None) on the
head_etag call, which swallows any transient storage errors and returns None,
causing the function to fall through and return a non-existent local_path that
makes the caller see a misleading "repo not found" error. Replace the
unwrap_or(None) on the head_etag(...).await call with the ? operator to
propagate storage errors instead of silently converting them to None, so real
errors surface to the caller rather than being hidden as "not found" responses.
- Around line 79-116: The sync_down_if_stale function uses permissive self-heal
fallbacks for both read and write paths, but this causes a lost-update issue for
writes: when the remote etag differs from the cached etag (remote is newer) and
download fails, the function falls back to the stale local copy, which then gets
uploaded and overwrites the newer remote archive. Add a freshness-requirement
parameter to sync_down_if_stale that gates the self-heal behavior: when
freshness is required (write path), propagate errors from the HEAD check (around
lines 82-89) and download operation (around lines 105-115) instead of falling
back to the local copy; when freshness is not required (read path), keep the
current permissive fallback. Update calls from acquire_fresh to pass false
(freshness not required) and acquire_write to pass true (freshness required).

In `@crates/gitlawb-node/src/storage/fs.rs`:
- Around line 75-82: The fixed temporary file extension `tmp-put` created by
`path.with_extension("tmp-put")` on line 75 causes a race condition when
concurrent `put` operations occur on the same key, as both writes will use the
same temp file path and corrupt each other's data. Generate a unique temporary
filename for each write operation instead of using a fixed extension, such as by
appending a UUID, random suffix, or timestamp to the temporary filename to
ensure each concurrent write has its own isolated temp file before the atomic
rename into place.

In `@crates/gitlawb-node/src/storage/ipfs.rs`:
- Around line 166-168: The list() function currently skips non-success responses
from the files/ls endpoint by using continue, which silently treats
authentication, network, and server errors as empty successful listings. Instead
of continuing when !resp.status().is_success() is true, you should properly
handle the error by returning a Result error that propagates the failure to the
caller, ensuring that list() accurately reflects whether the operation succeeded
or failed rather than returning partial or empty results on error conditions.
- Around line 25-29: The reqwest::Client::new() call in the new method does not
configure any timeout values, which allows requests to the IPFS API to hang
indefinitely if the service becomes slow or unreachable. Replace the default
Client::new() call with a Client::builder() pattern that explicitly sets timeout
values using methods like timeout() to define a reasonable connection and
request timeout duration, then call build() to create the configured client.
This ensures that storage operations have bounded execution times and prevents
indefinite hangs.

In `@crates/gitlawb-node/src/storage/mod.rs`:
- Around line 67-70: The selection order documentation comment in the storage
module starting with "Selection order:" is incomplete and does not reflect the
actual implementation. Update the comment block to include the auto-detection of
the ipfs backend when the GITLAWB_IPFS_API environment variable is set, which
occurs in the actual implementation around lines 84-86. Add ipfs as an
additional auto-selection option to the documented order after the fs fallback
so the documentation accurately describes all auto-detection paths in the
storage backend selection logic.
- Around line 91-145: The function currently downgrades to local-only mode
(returning None) when explicitly configured backends fail to initialize, which
can silently accept writes without the intended durable backend and cause data
persistence issues. In the match statement, change the error handling for the
s3, fs, and ipfs branches so that when backend creation fails (in the Err
branches of S3BlobStore::new, FsBlobStore::new, and IpfsBlobStore::new), return
the error to propagate it up the call stack rather than logging a warning and
returning None. This ensures that if a backend is explicitly configured but
cannot be initialized, the application will fail at startup rather than silently
degrading to local-only mode.

---

Outside diff comments:
In `@crates/gitlawb-node/src/storage/archive.rs`:
- Around line 162-184: The swap phase in the archive extraction does not clean
up tmp_dir if either the remove_dir_all(local_path) call or the rename call
fails, leaving orphaned temporary directories. Additionally, removing local_path
before confirming the rename succeeds risks losing the only valid copy if rename
fails. Ensure that tmp_dir is cleaned up in all failure paths during the swap
phase by wrapping the entire lock-acquisition and swap logic so that if any
error occurs after acquiring the publish lock, the tmp_dir is removed before
returning the error. Consider the order of operations to minimize data loss risk
by only removing the old local_path after confirming the rename will succeed.

---

Nitpick comments:
In `@crates/gitlawb-node/src/storage/fs.rs`:
- Around line 114-117: The list() function silently ignores directory read
failures by using an Err(_) => continue pattern in the std::fs::read_dir(&dir)
match statement, which causes it to return Ok(keys) with incomplete results.
Replace this silent failure handling with proper error propagation or handling
that ensures callers are informed of failures rather than acting on incomplete
inventory data. Instead of continuing on read_dir errors, return an Err result
or log and return the appropriate error indication so that admin, GC, and
migration processes receive accurate information about the success or failure of
listing all keys.

In `@crates/gitlawb-node/src/storage/ipfs.rs`:
- Around line 72-73: The multipart Part creation uses body.to_vec() which copies
the entire blob into memory before uploading, doubling memory usage for large
files. Replace the reqwest::multipart::Part::bytes() call with a streaming
approach such as reqwest::multipart::Part::stream() to avoid the unnecessary
copy and pass the body content directly as a stream instead of converting it to
a vector first.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3b25ebb6-e61a-4ce6-a7ae-1636bdde8f45

📥 Commits

Reviewing files that changed from the base of the PR and between 97d37fd and 7b7e690.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • crates/gitlawb-node/Cargo.toml
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/config.rs
  • crates/gitlawb-node/src/git/mod.rs
  • crates/gitlawb-node/src/git/repo_store.rs
  • crates/gitlawb-node/src/main.rs
  • crates/gitlawb-node/src/storage/archive.rs
  • crates/gitlawb-node/src/storage/fs.rs
  • crates/gitlawb-node/src/storage/ipfs.rs
  • crates/gitlawb-node/src/storage/mod.rs
  • crates/gitlawb-node/src/storage/s3.rs
💤 Files with no reviewable changes (1)
  • crates/gitlawb-node/src/git/mod.rs

Comment thread crates/gitlawb-node/src/api/repos.rs Outdated
Comment thread crates/gitlawb-node/src/git/repo_store.rs
Comment thread crates/gitlawb-node/src/git/repo_store.rs
Comment thread crates/gitlawb-node/src/storage/fs.rs Outdated
Comment thread crates/gitlawb-node/src/storage/ipfs.rs
Comment thread crates/gitlawb-node/src/storage/ipfs.rs
Comment thread crates/gitlawb-node/src/storage/mod.rs Outdated
Comment thread crates/gitlawb-node/src/storage/mod.rs
Major:
- repo_store: add require_fresh to sync_down_if_stale. Write path
  (acquire_write) fails closed instead of falling back to a stale local copy
  that a later upload would use to clobber a newer remote archive (lost update);
  read path (acquire_fresh) keeps self-heal.
- fs: unique per-write temp name (uuid) + cleanup on failure, so concurrent
  puts to the same key can't corrupt each other.
- storage::build: return Result and fail closed — a configured-but-unavailable
  or misconfigured backend now aborts boot instead of silently downgrading to
  local-only (durability drift). main.rs propagates the error.
- ipfs: give the HTTP client connect/total timeouts so an unresponsive Kubo
  API can't hang push/write flows.

Minor/trivial:
- repo_store::acquire: propagate storage HEAD errors on cache miss instead of
  collapsing them into a misleading 404.
- repo_store: log advisory-lock release failures instead of discarding them.
- archive: clean up the extracted temp dir if the swap step fails.
- fs::list / ipfs::list: propagate read/HTTP errors instead of silently
  returning a partial listing as success.
- ipfs::put: stream the body (Part::stream_with_length) instead of to_vec,
  halving peak upload memory.
- storage::build doc: document IPFS auto-detection.

Adds a concurrent-put FS test. Deferred (follow-up): integrating the
write-back release task with graceful-shutdown drain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kevincodex1 kevincodex1 requested a review from beardthelion June 23, 2026 11:46
@kevincodex1

Copy link
Copy Markdown
Contributor Author

please have a look into this @beardthelion

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the merged state against current main (clean merge), built it, ran the storage and repo_store unit tests, and traced the write and consistency paths. The BlobStore abstraction and the Tigris-to-archive migration are clean and complete, and the fs backend's atomic swap and per-method key validation hold up. My concerns are all in the push-latency changes: the write-back default has a data-loss path, and it leans on an advisory-lock pattern that isn't sound over a connection pool. Highest first.

Findings

  • [P1] Default async_upload to false; ack-before-upload can lose an acked push on a rolling deploy. crates/gitlawb-node/src/config.rs (GITLAWB_ASYNC_UPLOAD is default_value_t = true), api/repos.rs:561. With it on, git_receive_pack acks the client and uploads in a detached task. If the process stops in that window (a SIGTERM during a rolling deploy is the routine trigger, not just a crash), storage keeps the old archive. On restart the in-process versions cache is empty, so the write-path sync_down_if_stale (require_fresh=true) finds no cached etag, downloads the stale remote, and overwrites the newer local copy at repo_store.rs:102-113. Lazy migration does not save it: acquire only uploads when storage has no object (:154-175), so an already-present stale archive is never re-synced. The "local copy survives; lazy migration re-syncs" comment does not hold. Default this to false, or add a durable local-newer marker plus a visible signal on upload failure and a shutdown drain before making it the default.

  • [P2, pre-existing] The advisory lock is acquired and released on different pooled connections. repo_store.rs:233 locks via fetch_one(&self.pool); :260/:494 unlock via execute(&self.pool). Session-scoped Postgres advisory locks only release on the connection that took them, and sqlx hands out an arbitrary pooled connection per query, so the unlock can land on a different session and silently no-op (the boolean result is not checked) while the lock persists. A later same-session acquire is re-entrant and succeeds spuriously; a different session blocks 60s then 500s. This is already on main, so the PR does not introduce it, but the write-back change leans on the lock being held through the upload and now runs the unlock from a detached task that is dropped on crash, so it depends on a guarantee that is not actually there. Worth fixing as part of leaning on it: hold one connection (pool.acquire()) for the lock's lifetime and run lock+unlock on it, or use a transaction-scoped pg_advisory_xact_lock.

  • [P2] The S3 client sets no request timeout. storage/s3.rs:31. S3BlobStore::new builds the client with no timeout config, and the AWS Rust SDK sets none by default, so a hung S3 endpoint blocks the calling task unbounded. Under async_upload that task holds the advisory lock, so one stuck upload blocks every later push to the repo. The IPFS client already sets connect and total timeouts; S3 should match, via aws_sdk_s3::config::timeout::TimeoutConfig on the client builder or a tokio::time::timeout wrapper.

  • [P2] "Fail closed when the backend is unreachable" only covers missing config, not connectivity. storage/mod.rs:69. The doc says a configured-but-unavailable backend returns Err, but S3BlobStore::new only loads credentials and builds the client with no network call, so an unreachable bucket boots clean and fails on the first live request. Scope the comment to config validation, or add a cheap probe against the bucket at boot if real fail-closed is intended.

  • [P2] The risky new paths have no tests. compress_repo/decompress_repo are pure filesystem functions but there is no round-trip test and nothing covering the "corrupt archive leaves the original intact" atomicity claim, which is the whole download-durability story. sync_down_if_stale's require_fresh=true (the lost-update guard) and the etag-skip are also untested (for_testing forces archive=None). Both are cheap to add with the fs backend and a test constructor that accepts an archive.

P3s, briefly: IPFS list() doesn't reject traversal in its prefix the way the other four methods do (ipfs.rs:159), latent since it has no caller (a traversal-segment check that still allows the empty list-all prefix, or just drop list()); absent-vs-error detection keys on Kubo error-string substrings (ipfs.rs:68), and since Kubo's error Code is always 0 there's nothing structured to switch on, so a pinned comment is the realistic fix; IPFS put() returns Err after a successful write when the follow-up stat fails (ipfs.rs:104); and several doc comments still say "Tigris" after the rename (repo_store.rs:131-133,212,282, archive.rs:4).

The abstraction is the right call and the migration is complete. Once the async_upload default and the advisory-lock soundness are sorted I'm glad to take another pass. @kevincodex1

…, tests)

P1:
- config: default GITLAWB_ASYNC_UPLOAD to false. Write-back ack opened a
  data-loss window (crash between ack and upload → stale remote overwrites
  newer local on restart); strict upload-before-ack is now the default and the
  latency optimization is opt-in.

P2:
- repo_store: hold ONE pooled connection for the advisory lock's lifetime.
  pg_advisory_lock is session-scoped, so acquiring and releasing on different
  pooled connections let unlock silently no-op while the lock lingered. Acquire,
  sync-error release, and final release now all run on the pinned connection
  (RepoWriteGuard owns a PoolConnection instead of the pool).
- s3: add TimeoutConfig (60s attempt / 300s operation) so a hung endpoint can't
  block the task — and, under async_upload, the advisory lock — unbounded.
- storage::build: scope the fail-closed doc to config + client construction;
  note that live connectivity (e.g. unreachable S3 bucket) surfaces on first use.

P3:
- ipfs: reject `..`/`.` traversal in list() prefixes; make a failed post-write
  stat non-fatal (the write succeeded — return without an etag).
- docs: replace stale generic "Tigris" references with "storage" across
  repo_store/api/state/archive; keep legitimate Tigris-as-S3-example mentions.

Tests: archive compress/decompress round-trip + atomicity (corrupt archive
leaves existing copy intact) + fs upload/download; repo_store sync_down_if_stale
etag-skip and require_fresh fail-closed-vs-fallback.

Skipped with reason: IPFS string-matched error detection — Kubo's HTTP API
returns 500 + JSON message with no stable error codes, so substring matching is
the pragmatic norm; a structured-error layer is disproportionate here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kevincodex1 kevincodex1 requested a review from beardthelion June 23, 2026 16:33
@kevincodex1

Copy link
Copy Markdown
Contributor Author

please check again @beardthelion

@beardthelion beardthelion dismissed their stale review June 23, 2026 17:22

Superseded: the five findings from this review are resolved on d35a1c8; re-reviewing the new head separately.

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this addresses the prior round cleanly. I re-reviewed the merged state on the new head: async_upload now defaults false, the advisory lock is acquired and released on a single pinned connection (sound on every exit path I traced), the S3 client has operation/attempt timeouts, ipfs put no longer fails on a post-write stat, and the new compress/decompress and sync_down_if_stale tests exercise real behavior. All five prior findings are resolved.

One regression the lock fix introduces, which is why I'm holding for another round.

Findings

  • [P2] Bound concurrent pushes per DID before pinning a pool connection per push. git/repo_store.rs:236 (the pool.acquire() now held for the whole write), server.rs:178 (push routes). acquire_write holds one connection from the shared pool across the write, the 60s lock-retry, and the upload, and git_write_routes has no per-DID rate limit (unlike creation_routes at server.rs:96). Because identities are permissionless, a single actor can fire enough concurrent or slow pushes to exhaust the default 10-connection pool (db/mod.rs:269, no PgPoolOptions), after which every DB-touching handler node-wide blocks on pool.acquire() for the ~30s acquire timeout and then fails. It's bounded (30s acquire timeout, 300s S3 op timeout) and self-recovers, but it's repeatable and reachable, and the previous connection-per-query design didn't have it. Bound concurrent writes per DID on git_write_routes (the rate_limit_by_did layer creation already uses, or a semaphore), and/or give lock connections a dedicated, sized pool. Not pg_advisory_xact_lock here: the lock has to outlive the transaction (it is held across the post-commit upload), so a transaction-scoped lock would release before the upload and reopen the stale-archive window the lock exists to close.

  • [P3] Add #[must_use] to RepoWriteGuard. git/repo_store.rs:456. The connection pinning is correct, but a guard dropped without release() returns its connection to the pool with the session advisory lock still held (sqlx does not reset session state on check-in), so it lingers until that backend connection is evicted. Every current caller releases on all paths, so this is not reachable today; #[must_use] is cheap insurance against a future caller that forgets.

The lock-soundness approach is right and the rest is solid. Once the push path is bounded I'll approve. @kevincodex1

Follow-up to the connection-pinning fix (beardthelion review):

- Pool exhaustion (P2): pinning a connection per push from the shared pool let
  a burst of concurrent/slow pushes drain it and block every DB handler
  node-wide. Give advisory locks a SEPARATE bounded pool (16 conns) created in
  main; acquire_write pins from it. RepoStore now holds only that lock pool —
  it touches Postgres solely for locks — so the handler pool is never starved
  by pushes. Removed the now-unused Db::pool() accessor.
- Lock leak via dropped guard (P3): add #[must_use] to RepoWriteGuard. Dropping
  it without release() returns the connection to the pool with the session
  advisory lock still held until that backend connection is evicted; the
  annotation makes an accidental drop a compile-time warning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kevincodex1

Copy link
Copy Markdown
Contributor Author

@coderabbitai please do full review of this PR

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

@kevincodex1 Sure! I'll kick off a full review of this PR now. The changes look substantial — replacing the Tigris-specific client with a storage-agnostic BlobStore layer across s3, fs, and ipfs backends, plus the RepoArchive abstraction, advisory-lock pool isolation, and async upload path. I'll examine all of it thoroughly.

✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/git/repo_store.rs (1)

161-190: 🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy lift

Serialize background archive uploads with the write lock.

Both lazy migration and init() upload archives from background tasks without holding the repo advisory lock. A first push can write and upload a newer archive while one of these older background uploads is still compressing; if the background put finishes last, it overwrites the newer storage object. This is especially reachable for a newly created repo where the empty initial archive races the first push.

Move these uploads behind the same advisory-lock path and re-check storage under the lock before upload, or make init() upload synchronously before the repo is exposed.

Also applies to: 316-329

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/git/repo_store.rs` around lines 161 - 190, The
background archive uploads in the spawned async task lack serialization with the
repo advisory lock, creating a race condition where an older background upload
can overwrite a newer archive pushed by the user. Acquire the repo advisory lock
before performing the archive.upload() call within the tokio::spawn task,
re-check if the archive already exists in storage after acquiring the lock (to
avoid redundant uploads), and only proceed with the upload if the lock is held
and the storage check confirms it's needed. Apply this same fix to the other
location mentioned at lines 316-329 where archive uploads occur outside the
advisory lock path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 556-560: The comment describing the write-back ack behavior
incorrectly claims that lazy migration provides recovery for the
crash-durability window when async uploads are deferred. This is inaccurate
because lazy migration does not reliably re-sync when an older archive already
exists in storage. Update the comment to accurately reflect the actual
durability tradeoff: if the node stops after sending the ack but before the
upload completes, storage remains stale and a restart may roll the local cache
back from storage. Remove the claim that lazy migration re-syncs and instead
describe the realistic recovery behavior and the latency-durability tradeoff
being made.

In `@crates/gitlawb-node/src/config.rs`:
- Around line 155-162: The `guard.release()` method returns unit type `()`
instead of a `Result`, causing upload errors to be silently logged and never
propagated to callers. This violates the durability guarantee of the strict path
when `async_upload=false`, where clients receive acknowledgment even though the
durable upload failed. Modify the `release()` method to return a `Result` type
so that errors can be properly propagated, then update all write endpoint call
sites (repos, pulls, issues) that call `guard.release().await` to handle and
propagate the `Result` in the strict path, ensuring the client only receives an
acknowledgment after confirmed successful durable upload.

In `@crates/gitlawb-node/src/git/repo_store.rs`:
- Around line 346-356: The archive.upload() error handling currently only logs
warnings without propagating the failure, causing synchronous write operations
to succeed even when durable storage updates fail. Modify the function
containing this match statement to return a Result type, and change the Err(e)
case to return an error instead of just logging it. For synchronous callers
(async_upload=false), let them handle the returned error to fail the request
appropriately. For asynchronous paths, spawn a background task that can still
log any returned errors. Apply these changes to all similar upload failure
handling patterns throughout the file, including the related code at lines
503-516.
- Around line 519-521: When the write fails and is skipped (in the else block
where the warn! log is), the cached etag for this repo must be invalidated to
prevent subsequent write operations from using stale cache information. Add a
call after the warn! statement to drop or clear the cached etag version
associated with self.repo_name, ensuring that the next write operation will
perform a fresh pre-write download and not operate on inconsistent local state.

In `@crates/gitlawb-node/src/storage/archive.rs`:
- Around line 180-190: The current implementation deletes the old repo before
attempting the rename, which means if the rename fails the old repo is lost. To
fix this, instead of immediately removing the old repo with remove_dir_all when
local_path exists, first move it to a temporary backup location using
std::fs::rename. Then attempt to rename the extracted tmp_dir into place. If the
rename succeeds, clean up the backup directory. If the rename fails (in the
error handling block), restore the original repo by renaming the backup back to
local_path before removing the failed tmp_dir, ensuring the old repo is
preserved on failure.

In `@crates/gitlawb-node/src/storage/fs.rs`:
- Around line 98-103: In the `head` method, replace the error detection logic
that uses `path.exists()` with explicit error kind checking by checking if
`e.kind() == std::io::ErrorKind::NotFound` instead, since `exists()` returns
false for both missing files and permission/IO errors, causing real errors to be
silently masked. Additionally, locate the code around line 127 that contains
`rd.flatten()` in what appears to be a directory listing operation and remove
the `flatten()` call to ensure per-entry errors are properly propagated with
context instead of being silently dropped, as dropping errors causes partial
results to be reported as success and misleads callers that depend on complete
directory listings.

In `@crates/gitlawb-node/src/storage/mod.rs`:
- Around line 47-50: The BlobStore trait methods `get` and `put` require
buffering entire objects as Bytes in memory, which causes out-of-memory issues
for large repository archives. Add streaming/spooling APIs to the BlobStore
trait (such as stream-based get and put methods) that can handle objects
incrementally without loading the full content into memory at once. Modify the
archive upload/download code paths to use these new streaming methods instead of
the current whole-object `get` and `put` contract, allowing large `.tar.zst`
files to be processed in chunks.
- Around line 148-152: The validate_key function is missing validation for
backslashes in blob keys, which could cause path confusion on Windows since blob
keys are documented as forward-slash-delimited. Add a check in the validation
logic that rejects any key containing a backslash character using the contains
method, similar to the existing null byte check. Include this validation
alongside the other blob key checks (the traversal segment check and the
absolute path/null byte check) to ensure backslashes are caught during input
validation rather than relying on downstream conversion like the list function
does.

---

Outside diff comments:
In `@crates/gitlawb-node/src/git/repo_store.rs`:
- Around line 161-190: The background archive uploads in the spawned async task
lack serialization with the repo advisory lock, creating a race condition where
an older background upload can overwrite a newer archive pushed by the user.
Acquire the repo advisory lock before performing the archive.upload() call
within the tokio::spawn task, re-check if the archive already exists in storage
after acquiring the lock (to avoid redundant uploads), and only proceed with the
upload if the lock is held and the storage check confirms it's needed. Apply
this same fix to the other location mentioned at lines 316-329 where archive
uploads occur outside the advisory lock path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 604e1276-dc51-4f03-a84b-b577d38cfffe

📥 Commits

Reviewing files that changed from the base of the PR and between 97d37fd and e179daa.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • crates/gitlawb-node/Cargo.toml
  • crates/gitlawb-node/src/api/issues.rs
  • crates/gitlawb-node/src/api/pulls.rs
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/config.rs
  • crates/gitlawb-node/src/db/mod.rs
  • crates/gitlawb-node/src/git/mod.rs
  • crates/gitlawb-node/src/git/repo_store.rs
  • crates/gitlawb-node/src/git/tigris.rs
  • crates/gitlawb-node/src/main.rs
  • crates/gitlawb-node/src/state.rs
  • crates/gitlawb-node/src/storage/archive.rs
  • crates/gitlawb-node/src/storage/fs.rs
  • crates/gitlawb-node/src/storage/ipfs.rs
  • crates/gitlawb-node/src/storage/mod.rs
  • crates/gitlawb-node/src/storage/s3.rs
💤 Files with no reviewable changes (3)
  • crates/gitlawb-node/src/git/mod.rs
  • crates/gitlawb-node/src/db/mod.rs
  • crates/gitlawb-node/src/git/tigris.rs

Comment thread crates/gitlawb-node/src/api/repos.rs Outdated
Comment thread crates/gitlawb-node/src/config.rs
Comment thread crates/gitlawb-node/src/git/repo_store.rs Outdated
Comment thread crates/gitlawb-node/src/git/repo_store.rs Outdated
Comment thread crates/gitlawb-node/src/storage/archive.rs
Comment thread crates/gitlawb-node/src/storage/fs.rs Outdated
Comment on lines +47 to +50
async fn get(&self, key: &str) -> Result<Option<Bytes>>;

/// Store an object, returning its metadata (including the new etag).
async fn put(&self, key: &str, body: Bytes) -> Result<ObjectMeta>;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🟠 Major | 🏗️ Heavy lift

Avoid making large repo archives a whole-memory BlobStore operation.

get/put require entire objects as Bytes, so repo archive upload/download paths must buffer the full .tar.zst in memory. Large repositories can OOM the node during push/fetch; consider streaming/spooling APIs for archive objects instead of a whole-object contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/storage/mod.rs` around lines 47 - 50, The BlobStore
trait methods `get` and `put` require buffering entire objects as Bytes in
memory, which causes out-of-memory issues for large repository archives. Add
streaming/spooling APIs to the BlobStore trait (such as stream-based get and put
methods) that can handle objects incrementally without loading the full content
into memory at once. Modify the archive upload/download code paths to use these
new streaming methods instead of the current whole-object `get` and `put`
contract, allowing large `.tar.zst` files to be processed in chunks.

Comment on lines +148 to +152
if key.split('/').any(|seg| seg == ".." || seg == ".") {
anyhow::bail!("blob key contains traversal segment: {key}");
}
if key.starts_with('/') || key.contains('\0') {
anyhow::bail!("blob key is absolute or contains null byte: {key}");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's explore the storage module structure
fd -t f -e rs . crates/gitlawb-node/src/storage/

# Also check what's in the directory
ls -la crates/gitlawb-node/src/storage/

Repository: Gitlawb/node

Length of output: 750


🏁 Script executed:

# Find and examine FsBlobStore implementation
rg -t rs "FsBlobStore" --max-count 20 -A 5 -B 2

Repository: Gitlawb/node

Length of output: 177


🏁 Script executed:

# Look at the full storage/mod.rs file around the lines in question
sed -n '140,160p' crates/gitlawb-node/src/storage/mod.rs

Repository: Gitlawb/node

Length of output: 642


🏁 Script executed:

# Search for documentation about blob keys
rg -t rs "blob key" --max-count 30 -B 2 -A 2

Repository: Gitlawb/node

Length of output: 177


🏁 Script executed:

# Search for FsBlobStore with proper rg syntax
rg "FsBlobStore" --max-count 30 -A 5 -B 2 crates/gitlawb-node/src/storage/

Repository: Gitlawb/node

Length of output: 5691


🏁 Script executed:

# Look at fs.rs to understand how FsBlobStore handles blob keys
cat -n crates/gitlawb-node/src/storage/fs.rs

Repository: Gitlawb/node

Length of output: 10079


🏁 Script executed:

# Search for how validate_key is used
rg "validate_key" --max-count 20 -B 3 -A 3 crates/gitlawb-node/src/storage/

Repository: Gitlawb/node

Length of output: 6806


🏁 Script executed:

# Search for blob key documentation in comments
rg "blob key" -B 3 -A 3 crates/gitlawb-node/src/

Repository: Gitlawb/node

Length of output: 1661


🏁 Script executed:

# Check if there are any tests for backslash rejection in blob keys
rg "backslash|\\\\|platform separator" crates/gitlawb-node/src/storage/ -A 3 -B 3

Repository: Gitlawb/node

Length of output: 6043


🏁 Script executed:

# Look for any documentation about blob key format
rg -B 5 -A 5 "forward-slash|forward slash|blob key format" crates/gitlawb-node/src/storage/

Repository: Gitlawb/node

Length of output: 931


🏁 Script executed:

# Check the list function to see why they handle backslashes there
sed -n '116,143p' crates/gitlawb-node/src/storage/fs.rs

Repository: Gitlawb/node

Length of output: 1376


🏁 Script executed:

# Search for any comments or issues related to Windows path handling
rg "Windows|windows|platform|separator" crates/gitlawb-node/src/storage/ -B 2 -A 2

Repository: Gitlawb/node

Length of output: 680


Reject backslashes in blob keys during validation.

Blob keys are documented as forward-slash-delimited. On Windows, PathBuf::join() interprets backslashes as path separators, which could lead to path confusion despite the secondary starts_with check. The list() function already converts backslashes (line 132 in fs.rs), indicating platform awareness—input validation should enforce the documented format. Add a backslash check to validate_key():

Suggested fix
-    if key.starts_with('/') || key.contains('\0') {
-        anyhow::bail!("blob key is absolute or contains null byte: {key}");
+    if key.starts_with('/') || key.contains('\\') || key.contains('\0') {
+        anyhow::bail!("blob key is absolute, contains a platform separator, or contains null byte: {key}");
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/storage/mod.rs` around lines 148 - 152, The
validate_key function is missing validation for backslashes in blob keys, which
could cause path confusion on Windows since blob keys are documented as
forward-slash-delimited. Add a check in the validation logic that rejects any
key containing a backslash character using the contains method, similar to the
existing null byte check. Include this validation alongside the other blob key
checks (the traversal segment check and the absolute path/null byte check) to
ensure backslashes are caught during input validation rather than relying on
downstream conversion like the list function does.

CodeRabbit round (data-integrity/consistency):

- release()/release_after_write() now return Result. The push strict path,
  merge, fork, and issue writes surface a durable-upload failure to the client
  instead of acking a write that never reached storage; the async write-back
  path logs it. (findings 3, 4)
- Cache invalidation on failure: on a failed write OR a failed upload, drop the
  cached etag so the next write re-downloads and reconciles rather than trusting
  un-persisted local state. (finding 5)
- Background uploads serialized: new upload_under_lock() takes the per-repo
  advisory lock. init() uploads synchronously under it (fail closed); lazy
  migration uploads under it (skip-if-exists). Stops an older snapshot from
  clobbering a concurrent locked push. (finding 2)
- archive swap no longer destroys the old repo up front: move it to a backup,
  rename the new copy in, delete the backup on success, restore it on rename
  failure — local_path is never left missing. (finding 6)
- fs head() distinguishes NotFound from permission/IO errors via ErrorKind
  instead of path.exists(). (finding 1)
- fs list() propagates per-entry read errors instead of dropping them via
  flatten(). (finding 7)
- Reworded the async-upload comment to state the real tradeoff: on a crash
  after ack, storage stays stale until the repo's next successful push — it is
  not auto-reconciled. (finding 8)

Already in place from earlier commits (verified): IPFS list() traversal guard,
IPFS non-fatal post-write stat, S3 client timeouts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 1187-1197: In the error handler for the release_after_write call
(within the map_err block), add cleanup logic to remove the local fork directory
stored in disk_path before returning the AppError. This prevents orphan data
from accumulating when the durable upload fails, which can cause issues on retry
attempts due to the destination path already existing. The cleanup should happen
in the error path before constructing and returning the AppError with the
failure message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5d006b80-0156-44ab-a8a3-778966abe6f7

📥 Commits

Reviewing files that changed from the base of the PR and between e179daa and bd7c026.

📒 Files selected for processing (6)
  • crates/gitlawb-node/src/api/issues.rs
  • crates/gitlawb-node/src/api/pulls.rs
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/git/repo_store.rs
  • crates/gitlawb-node/src/storage/archive.rs
  • crates/gitlawb-node/src/storage/fs.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/gitlawb-node/src/storage/archive.rs
  • crates/gitlawb-node/src/storage/fs.rs
  • crates/gitlawb-node/src/git/repo_store.rs

Comment on lines +1187 to +1197
// Upload fork to storage — fail closed if the durable upload fails rather
// than reporting a fork that only exists on this node's local disk.
state
.repo_store
.release_after_write(&forker_did, &fork_name)
.await;
.await
.map_err(|e| {
AppError::Git(format!(
"fork created locally but durable upload failed: {e}"
))
})?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Clean up the local fork directory on durable-upload failure.

If release_after_write(...) fails, this returns an error but leaves the cloned disk_path behind. That creates orphan data and can make retries fail due to an existing destination path.

♻️ Proposed fix
-    state
-        .repo_store
-        .release_after_write(&forker_did, &fork_name)
-        .await
-        .map_err(|e| {
-            AppError::Git(format!(
-                "fork created locally but durable upload failed: {e}"
-            ))
-        })?;
+    if let Err(e) = state
+        .repo_store
+        .release_after_write(&forker_did, &fork_name)
+        .await
+    {
+        if let Err(cleanup_err) = tokio::fs::remove_dir_all(&disk_path).await {
+            tracing::warn!(
+                path = %disk_path.display(),
+                err = %cleanup_err,
+                "failed to clean up local fork after durable upload failure"
+            );
+        }
+        return Err(AppError::Git(format!(
+            "fork created locally but durable upload failed: {e}"
+        )));
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/repos.rs` around lines 1187 - 1197, In the error
handler for the release_after_write call (within the map_err block), add cleanup
logic to remove the local fork directory stored in disk_path before returning
the AppError. This prevents orphan data from accumulating when the durable
upload fails, which can cause issues on retry attempts due to the destination
path already existing. The cleanup should happen in the error path before
constructing and returning the AppError with the failure message.

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [P2] Clean up the local fork directory when fork creation fails after the clone
    crates/gitlawb-node/src/api/repos.rs:1187-1218
    After git clone --mirror succeeds, release_after_write is called without any cleanup on failure. If the durable upload fails, the handler returns an error but leaves the cloned mirror at disk_path. A later fork with the same name (or a retry of this one) will then fail because the directory already exists. The same leak happens if state.db.create_repo fails after the upload. Remove disk_path before returning from either error path. CodeRabbit's latest review explicitly requested this cleanup.

  • [P2] Don't block the Tokio runtime in FsBlobStore::head and ::put
    crates/gitlawb-node/src/storage/fs.rs:39-51 and crates/gitlawb-node/src/storage/fs.rs:69-107
    head and put both call Self::meta_of, which uses std::fs::metadata directly inside an async trait method. That blocks the async executor thread while stat'ing local files. For put, move the stat into the existing spawn_blocking closure; for head, use tokio::fs::metadata (or a spawn_blocking call) instead of the synchronous helper.

  • [P3] Advisory-lock acquisition timeout is shorter than the backend operation timeout
    crates/gitlawb-node/src/git/repo_store.rs:234-250
    acquire_write retries pg_try_advisory_lock for 60 seconds, but the S3 backend sets a 300-second operation timeout (crates/gitlawb-node/src/storage/s3.rs:44-47) and the IPFS backend sets the same 300-second total timeout (crates/gitlawb-node/src/storage/ipfs.rs:30-31). A single large-repo upload can therefore hold the advisory lock long enough for concurrent pushes to the same repo to be rejected after only 60 seconds. Align the lock-acquisition timeout with the configured storage operation timeout, or use a blocking advisory-lock call with a matching timeout.

@beardthelion beardthelion dismissed their stale review June 24, 2026 13:00

Both findings from this review are resolved on the current head: advisory locks now use a dedicated bounded pool (no handler-pool starvation), and RepoWriteGuard is #[must_use]. Dismissing so the state matches the current head; jatmn's open findings still stand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crate:node gitlawb-node — the serving node and REST API kind:feature New capability or surface sev:medium Degraded but workaround exists subsystem:storage Blob/object store, Arweave, IPFS, archives

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(storage): backend-agnostic BlobStore layer (S3 / filesystem / IPFS)

3 participants