feat: support manifest compaction in table commits#400
Conversation
7768824 to
eaf9177
Compare
| .read_for_full_compaction(&manifest_file, &delete_identifiers) | ||
| .await?; | ||
| if read_result.require_change { | ||
| new_files.extend(self.write_compacted_entries(&read_result.entries).await?); |
There was a problem hiding this comment.
Full compaction should feed all changed entries into one rolling writer instead of rewriting each input manifest independently. With the current loop, if a table has many small manifests and manifest.full-compaction-threshold-size is reached without deletes, every manifest is must_change, but each one is written back as its own new manifest and merge() returns before the minor compaction path can merge them. The result is N small manifests with new names, so full compaction does a lot of I/O while leaving the manifest count and small-file problem unchanged. Java keeps a single RollingFileWriter for the whole toBeMerged set and rolls only when manifest.target-file-size is reached; this path should do the same, or fall back to minor compaction when no entries were actually removed.
Purpose
Support manifest compaction during table commits and add Java-compatible manifest compaction options.
Tests
cargo fmt --allcargo test -p paimon manifest_compaction --features storage-memorycargo test -p paimon table_commit::tests --features storage-memorygit diff --check