perf: borrow input in ByteStorage write path instead of copying it#48
Merged
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
Walkthrough
ChangesStorageEnvelope::new borrow refactor
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
StorageEnvelope::new took its payload as Vec<u8> by value, but only ever compresses (lz4_flex::compress(&data)) and hashes (xxh3_64(&data)) it — both borrow. The Vec was never retained, so ByteStorage::store was forced to data.to_vec() its &[u8] input first: a full-payload allocation + memcpy (up to MAX_UNCOMPRESSED_SIZE = 512 MB) on every write, for nothing. Change StorageEnvelope::new to take &[u8]; store() passes its slice straight through. One fewer copy per write, scaling with payload size — directly trims the large-object write-path peak RSS. Addresses the cachekit-core half of #45. The py.allow_threads / GIL-release half lives in the cachekit-py PyO3 bindings and is tracked there. API note: StorageEnvelope::new is pub but functions as an internal constructor; the load-bearing API ByteStorage::store(&[u8]) is unchanged. Greenfield, no back-compat. Internal tests + 3 fuzz targets updated to pass slices.
739cb50 to
832e773
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem (cachekit-core half of #45)
StorageEnvelope::newtook its payload asVec<u8>by value, but only ever:lz4_flex::compress(&data)— borrowsxxh3_64(&data)— borrowsThe
Vecis never retained. SoByteStorage::store(&[u8])was forced todata.to_vec()its borrowed input before callingnew— a full-payload allocation + memcpy on every write, up toMAX_UNCOMPRESSED_SIZE(512 MB), for nothing.Fix
Change
StorageEnvelope::newto take&[u8].store()passes its slice straight through. One fewer copy per write, scaling with payload size — directly trims the large-object write-path peak RSS that motivated the cachekit-py 9.4× write-RSS work.API note (greenfield, no back-compat)
StorageEnvelope::newispubbut functions as an internal constructor — the load-bearing public API,ByteStorage::store(&[u8]), is unchanged. Across the workspace, no production code callsStorageEnvelope::newdirectly; only tests and fuzz targets do (updated here). cachekit-py'spython_bindings.rsgoes throughstore. Downstream fuzz targets that callnewdirectly update when they bump core.This is a
perf:change → it will cut a release; consumers pick up the faster core on their next bump.Scope
Only the primary input copy. The secondary
rmp_serde::to_vec(&envelope)re-copy ofcompressed_data(also noted in #45) needs manual envelope framing and is left out deliberately (KISS).The GIL-release half of #45 (
py.allow_threads) lives in cachekit-py's PyO3 bindings, not here.Verification
cargo fmt --check✅ ·cargo clippy --all-features -- -D warnings✅cargo test --all-features✅ (all suites pass; behavior unchanged)cargo +nightly checkon the fuzz crate ✅ (edited targets compile)byte_storage/roundtrip, before vs after): 5 of 6 sizes faster. Numbers are noisy (short 2s window; roundtrip includes the unaffected retrieve/decompress; bench ceiling is 64 KB, well below where copy-elimination dominates) so I'm not citing precise %s — the real, provable win is the eliminated ≤512 MB copy on the write path. Extending the bench to MB-scale store-only is a worthwhile follow-up.Summary by CodeRabbit
Release Notes