Skip to content

feat: standalone checksum-only integrity API (#13)#50

Open
27Bslash6 wants to merge 7 commits into
mainfrom
feat/checksum-only-api
Open

feat: standalone checksum-only integrity API (#13)#50
27Bslash6 wants to merge 7 commits into
mainfrom
feat/checksum-only-api

Conversation

@27Bslash6

@27Bslash6 27Bslash6 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Closes the cachekit-core half of #13. (PyO3 bindings + py-vs-FFI benchmark land in a follow-up cachekit-py PR once 0.3.0 publishes.)

What

Exposes xxHash3-64 integrity as a standalone primitive, decoupled from compression — two free functions gated on feature = "checksum" alone:

  • checksum(data: &[u8]) -> [u8; 8]
  • verify_checksum(data: &[u8], expected: &[u8; 8]) -> bool

Usable with default-features = false, features = ["checksum"] (no LZ4/messagepack). This unblocks callers (e.g. Python Arrow/JSON serializers) that want the fast 8-byte xxHash3 checksum where LZ4 compression is ineffective, without reaching for Blake3.

DRY

StorageEnvelope::{new,extract} now consume the new primitive — one canonical xxHash3-64 definition. The inline xxh3_64 is gone from byte_storage.rs. No wire-format change: the stored checksum bytes are byte-identical (big-endian), test-locked by envelope_embeds_canonical_checksum.

Design notes (deliberate — please don't "fix")

  • checksum() is intentionally unbounded (no size cap): a pure O(n) hash over already-materialized bytes; the MAX_UNCOMPRESSED_SIZE cap is StorageEnvelope's decompression-bomb concern, not applicable here.
  • verify_checksum is plain (non-constant-time) equality: correct for a non-cryptographic corruption check. Tamper-resistance is AES-256-GCM's job.

Tests

  • Determinism, empty-input (value-pinned), match/reject, single-bit-flip rejection.
  • Known-answer test locking algorithm + big-endian order (checksum(b"cachekit-kat")), reproduced independently against Python xxhash.
  • DRY-guard (write path) + tightened fail-open guard (extract returns the ChecksumMismatch variant on corruption).

Verification

cargo fmt --check · cargo clippy --all-features -- -D warnings · cargo test --all-features (198 pass) · cargo test --no-default-features --features checksum --lib (feature-gating) — all green.

Release

feat: → release-please cuts 0.3.0, co-tenant with #48 (perf: borrow input…). Both are already on this branch's base.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a standalone checksum module (feature-gated) for data integrity verification.
  • Improvements

    • Refactored checksum verification to use a centralised module approach.
    • Updated documentation to clarify that checksums can be used independently of compression.

Extracts xxHash3-64 checksum and verify_checksum as a standalone public
primitive in src/checksum.rs, gated on the 'checksum' feature alone.
Usable without compression or messagepack. Includes 6 unit tests: 5
behavioral + 1 known-answer regression locking algorithm and big-endian
byte order. Wire value is identical to StorageEnvelope's embedded checksum.
DRY: replace inline xxh3_64(data).to_be_bytes() with crate::checksum::checksum(data).
The DRY-guard test (envelope_embeds_canonical_checksum) confirms byte-identical
wire output before and after the refactor. xxh3_64 import retained — extract()
still uses it.
DRY: replace inline xxh3_64(&decompressed).to_be_bytes() + manual compare
with crate::checksum::verify_checksum(). ChecksumMismatch error variant is
preserved on false return. Removes the now-dead xxhash_rust import from
byte_storage.rs — single canonical xxHash3-64 definition lives in checksum.rs.
Updates the xxHash3-64 security property bullet to call out standalone
availability via checksum/verify_checksum without requiring compression.
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ebe5d086-581f-4d7e-b1b5-1e3e719228b4

📥 Commits

Reviewing files that changed from the base of the PR and between 7f5ebc1 and 4003b0e.

📒 Files selected for processing (4)
  • README.md
  • src/byte_storage.rs
  • src/checksum.rs
  • src/lib.rs

Walkthrough

A new src/checksum.rs module is added, implementing checksum and verify_checksum as standalone xxHash3-64 primitives. These are exported from src/lib.rs under a checksum feature flag. src/byte_storage.rs is refactored to delegate all checksum computation and verification to this new module, removing the direct xxh3_64 import.

Changes

Standalone Checksum Module

Layer / File(s) Summary
New checksum.rs primitive and crate wiring
src/checksum.rs, src/lib.rs
Adds pub fn checksum(data: &[u8]) -> [u8; 8] and pub fn verify_checksum(data: &[u8], expected: &[u8; 8]) -> bool with xxHash3-64 big-endian output and six unit tests. src/lib.rs exports these as a feature-gated pub mod checksum with re-exports at the crate root, and updates the Security Properties docs to mention standalone use.
byte_storage.rs delegation to crate::checksum
src/byte_storage.rs
Removes the direct xxh3_64 import; StorageEnvelope::new calls crate::checksum::checksum(data) and StorageEnvelope::extract calls crate::checksum::verify_checksum. Tests are updated to assert canonical checksum equality and that tampering yields specifically ByteStorageError::ChecksumMismatch.
README architecture diagram
README.md
Adds checksum.rs entry to the src/ directory tree in the Architecture section.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Possibly related PRs

  • cachekit-io/cachekit-core#48: Also modifies StorageEnvelope::new and checksum-related logic in src/byte_storage.rs, directly overlapping with this PR's refactoring of the same code path.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the primary change—introducing a standalone xxHash3-64 checksum API independent of compression, which is the central feature of the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/checksum-only-api

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant