Arrow batch-size estimate breaks the bounded-memory guarantee on skewed/wide frames

## Summary
`_bounded_chunksize` uses a uniform average `bytes_per_row = table.nbytes // table.num_rows` and caps batches by **row count only**. For variable-width/skewed frames (e.g. a JSON-blob column, clustered wide rows) a single record-batch can vastly exceed the 8 MiB target, and since zstd compression runs per-batch, the compressor working set spikes — exactly the OOM regime this feature targets.

## Evidence
- `src/cachekit/serializers/arrow_serializer.py:65-68` (estimate), `:223-227` (`max_chunksize` is rows-only), `:52-54`/`:218-222` (stated per-batch bound)
- Empirical: 100k tiny rows + 50 wide 2 MiB cells → chunksize 7966 rows → one batch ~100 MiB vs 8 MiB target (~12.5x overshoot)

## Impact
Silently undermines the peak-RSS bound for the target workload. No data-integrity impact; no test coverage for chunking.

## Fix
Byte-aware batching: accumulate batches by estimated bytes (or cap by both rows and a byte budget). Add a skewed-frame memory regression test.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow batch-size estimate breaks the bounded-memory guarantee on skewed/wide frames #161

Summary

Evidence

Impact

Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Arrow batch-size estimate breaks the bounded-memory guarantee on skewed/wide frames #161

Description

Summary

Evidence

Impact

Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions