Skip to content

fix: support context manager protocol on CDXToolkitWARCWriter (#68)#70

Merged
malteos merged 3 commits into
mainfrom
fix/68-warc-writer-context-manager
May 28, 2026
Merged

fix: support context manager protocol on CDXToolkitWARCWriter (#68)#70
malteos merged 3 commits into
mainfrom
fix/68-warc-writer-context-manager

Conversation

@malteos
Copy link
Copy Markdown
Collaborator

@malteos malteos commented May 27, 2026

Summary

  • Fixes feat: Implement 'CDXToolkitWARCWriter' as a context manager #68: CDXToolkitWARCWriter now implements __enter__ / __exit__, so callers can write with cdx_toolkit.warc.get_writer(...) as writer: instead of relying on a manual writer.close().
  • Especially valuable for S3 destinations, where the multipart upload only commits on close — today an exception during iteration silently drops the upload.
  • The existing explicit close() API is preserved, so no caller is broken. Updated the CLI (warcer) and examples/iter-and-warc.py to demonstrate the idiomatic usage.

Test plan

  • pytest tests/unit/ — 31 passed, 1 skipped (AWS-gated), 0 failed
  • flake8 cdx_toolkit tests — clean
  • New unit tests cover: basic with-block writes a non-empty WARC, __enter__ returns self, exceptions inside the with block propagate while still closing the file, and close() is idempotent.

Add __enter__/__exit__ so callers can write
`with cdx_toolkit.warc.get_writer(...) as writer:` and have the WARC
flushed automatically — important on S3 where the multipart upload is
only committed on close, and where a mid-iteration exception used to
leave the upload uncommitted.

Update cli.py and examples/iter-and-warc.py to use the new idiom; the
explicit close() API stays for backward compatibility.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.24%. Comparing base (e18f93a) to head (ed5a1c9).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #70      +/-   ##
==========================================
+ Coverage   97.12%   97.24%   +0.11%     
==========================================
  Files           9        9              
  Lines         939      943       +4     
==========================================
+ Hits          912      917       +5     
+ Misses         27       26       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

malteos added 2 commits May 27, 2026 11:40
…ame cases

Earlier commit accidentally swept those through a replace_all; only the
new context-manager tests need info to be a dict (they actually call
write_record, which triggers warcio's create_warcinfo_record).
Adds focused unit tests that drive cli.warcer() with a mocked CDX
fetcher. Exercises the previously-uncovered revisit branch (the one
codecov flagged on the patch coverage check) along with the fgrep /
fgrepv skip paths and the --size pop-and-forward to the writer.
@malteos malteos requested review from lfoppiano and wumpus May 27, 2026 10:00
@malteos malteos merged commit 72f37d6 into main May 28, 2026
14 checks passed
@malteos malteos deleted the fix/68-warc-writer-context-manager branch May 28, 2026 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Implement 'CDXToolkitWARCWriter' as a context manager

2 participants