adding --split-si functionality to rxiv-maker#306
Conversation
|
test comment - please ignore |
|
Code Review: PR 306 - Good addition for journal submissions. The overall structure is clean and the core feature works logically. A few issues to address before merging. OVERVIEW: Adds two new CLI flags to rxiv docx: --split-si (emits __main.docx + __si.docx via two separate DocxExporter runs) and --split-si-pdf (emits __main.docx + __si.pdf by building the full PDF then splitting it). DocxExporter gains content_mode ("full" / "main" / "si") and output_suffix parameters. BUG 1 - Documentation placed under the wrong command (docs/cli-reference.md): The --split-si bullet was added to the rxiv pdf options block, not rxiv docx. The rxiv docx section added later in the same file already documents both flags correctly — the stray entry under rxiv pdf should be removed. BUG 2 - str.replace(".docx", ...) replaces all occurrences (docx_exporter.py:93): docx_filename = docx_filename.replace(".docx", f"{self.output_suffix}.docx") — str.replace replaces every match. If the author name or title contains .docx the filename would be corrupted. Safer: stem = Path(docx_filename).stem; docx_filename = f"{stem}{self.output_suffix}.docx" CROSS-REFERENCE CORRECTNESS: In SI-only mode (content_mode="si"), the main document is not loaded, so figure_map in export() is built only from SI content. Any @fig:label in the SI that points to a main-document figure will silently produce wrong numbers or survive as a raw marker. The existing hide_si flow handles the inverse case (main-only export loads SI for label mapping). The SI-only case needs symmetric treatment. This is a correctness gap worth at least a documented limitation in the help text. CODE QUALITY 3 - Inline imports inside _build_and_export_si_pdf: Five imports are deferred inside the function body. Move them to module level to match the style used throughout the rest of the file. CODE QUALITY 4 - Intermediate PDF files not cleaned up after --split-si-pdf: split_pdf() writes __main.pdf and __si.pdf to the build output directory. The function copies __si.pdf next to the manuscript but never removes the intermediates. The leftover __main.pdf in the build dir is a surprising side-effect. CODE QUALITY 5 - Implicit hide_si mutation in init: self.hide_si = docx_config.get("hide_si", False); if self.content_mode == "main": self.hide_si = True # silently overrides config — This repurposes hide_si as a proxy for content_mode, making the logic harder to follow. Consider driving _load_markdown from content_mode directly without mutating hide_si. TEST 6 - Fragile argument-order assertion (test_docx.py:75): mock_pdf.assert_called_once_with(str(manuscript_dir), False, False) — This pins the call to the default values of quiet and verbose. A future parameter addition breaks this test for the wrong reason. Prefer: mock_pdf.assert_called_once(); assert mock_pdf.call_args.args[0] == str(manuscript_dir) TEST 7 - No integration test for cross-reference resolution in split modes: Existing tests verify which sections appear/disappear but not that @fig:/@sfig: markers resolve correctly when content is split. A test asserting no raw cross-reference markers survive in the output would add meaningful coverage. SUMMARY: Feature logic is correct for the common case. --split-si was added to wrong section in cli-reference.md. str.replace on .docx is fragile. SI cross-refs to main figures are not handled — silent numbering errors possible. Build-dir leftovers after --split-si-pdf. Inline imports are a style issue. One fragile test assertion. Cross-ref test coverage is missing. The documentation placement bug and str.replace filename issue should be fixed before merging. The cross-reference gap is the most important correctness concern — needs either a fix or a documented limitation. |
Code Review: PR #306Good addition for journal submissions. The overall structure is clean and the core feature works logically. A few issues to address before merging. OverviewAdds two new CLI flags to
Bugs1. Documentation placed under the wrong command ( The 2. docx_filename = docx_filename.replace(".docx", f"{self.output_suffix}.docx")
stem = Path(docx_filename).stem
docx_filename = f"{stem}{self.output_suffix}.docx"Cross-Reference CorrectnessIn SI-only mode ( The existing Code Quality3. Inline imports inside Five imports are deferred inside the function body. Move them to module level to match the style used throughout the rest of the file. 4. Intermediate PDF files not cleaned up after
5. Implicit self.hide_si = docx_config.get("hide_si", False)
if self.content_mode == "main":
self.hide_si = True # silently overrides configThis repurposes Tests6. Fragile argument-order assertion ( mock_pdf.assert_called_once_with(str(manuscript_dir), False, False)This pins the call to the default values of mock_pdf.assert_called_once()
assert mock_pdf.call_args.args[0] == str(manuscript_dir)7. No integration test for cross-reference resolution in split modes Existing tests verify which sections appear/disappear but not that Summary
The documentation placement bug and |
The two split-si unit tests patched the helper functions via the dotted string "rxiv_maker.cli.commands.docx._export_docx_file". Because rxiv_maker.cli.commands.__init__ re-exports the `docx` command, that name shadows the `docx` submodule on the package, so unittest.mock resolves the target to the RichCommand object instead of the module. This passed on Python 3.11 but failed on 3.10 (a supported version per requires-python), and `import ... as` is fooled the same way (IMPORT_FROM uses getattr). Fetch the real module via importlib.import_module and use patch.object so the helpers bind correctly on all supported Python versions. No production code changes; integration tests already exercised the real export. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code Review: PR #306 —
|
| Area | Status |
|---|---|
| Feature logic | ✅ Correct for the common case |
--split-si in wrong doc section |
❌ Must fix |
str.replace(".docx") fragile filename |
❌ Must fix |
| SI cross-refs to main figures | |
| Intermediate PDF cleanup | |
| Deferred imports | |
hide_si mutation |
|
| Fragile test assertion | |
| Missing default-mode CLI test |
The two documentation/filename bugs should be fixed before merging. The cross-reference gap is the most significant correctness concern and needs either a fix or an explicit note in the help text.
PR #306 added `rxiv docx --split-si` / `--split-si-pdf` but shipped without a changelog entry. Document them under the 1.20.3 release so the feature is not released undocumented. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This PR enables the --split-si and --split-si-pdf arguments to rxiv docx command:
This should be helpful tor journal submissions