[build] Freeze startup encoding modules into CPython binary#317
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to freeze the full encodings.* package into the CPython binary to reduce startup time by eliminating filesystem I/O during interpreter initialization (especially impactful in the Nanvix/ramfs environment).
Changes:
- Enable freezing of all
encodings.*modules viaTools/build/freeze_modules.py. - Extend the frozen module registry (
Python/frozen.c) and build rules (Makefile.pre.in,PCbuild/_freeze_module.*) to include the encodings modules. - Add a guard in
Python/import.cto avoid crashing when a deepfrozen-only module has no marshalled byte buffer in subinterpreters.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| Tools/build/freeze_modules.py | Enables freezing of encodings.* via the module selection list. |
| Python/import.c | Prevents NULL deref/crash when unmarshalling deepfrozen-only modules in subinterpreters. |
| Python/frozen.c | Adds encodings frozen entries + os.path platform mapping; updates frozen module tables. |
| Makefile.pre.in | Adds freeze/deepfreeze inputs/outputs and per-module freeze targets for encodings on POSIX builds. |
| PCbuild/_freeze_module.vcxproj | Adds Windows freeze targets for encodings modules. |
| PCbuild/_freeze_module.vcxproj.filters | Adds encodings sources into the Visual Studio filter list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
247b9fe to
c128c47
Compare
Freeze the five encoding modules imported during every interpreter
startup into the CPython binary, eliminating filesystem I/O that is
prohibitively slow on Nanvix ramfs/microvm.
During CPython startup, 25 modules are imported. Prior to this change,
20 were frozen/builtin but the encodings package and its codecs required
filesystem reads from ramfs. On Nanvix each file open/read/close
involves VM exits, adding measurable latency to startup.
Only the 5 startup-critical codecs are frozen (not all ~100 encoding
modules): encodings, encodings.aliases, encodings.ascii,
encodings.latin_1, and encodings.utf_8. Binary growth is ~92 KB.
Benchmark (microvm/standalone, 128 MB, 20 A/B interleaved pairs):
Metric Baseline Frozen Delta
Mean (ms) 79.8 78.0 -1.8
Median (ms) 79.0 70.5 -8.5 (10.8%)
Frozen wins 13/20 pairs
Changes:
- Tools/build/freeze_modules.py: add 5 encoding modules to FROZEN list
- Python/frozen.c: auto-regenerated (includes, externs, table entries)
- Makefile.pre.in: auto-regenerated (FROZEN_FILES_IN/OUT, build rules,
deepfreeze deps)
- PCbuild/_freeze_module.vcxproj{,.filters}: auto-regenerated
Supersedes #317.
c128c47 to
1bf0ad5
Compare
There was a problem hiding this comment.
Pull request overview
This PR reduces CPython interpreter startup latency on Nanvix by freezing the small set of encoding-related modules that are imported on every startup into the binary, avoiding slow ramfs filesystem I/O during initialization.
Changes:
- Add
encodings(package) and four startup-critical codec modules to the frozen stdlib startup set. - Regenerate frozen-module artifacts (
Python/frozen.c,Makefile.pre.in, and Windows freeze project files) to include the new modules.
Show a summary per file
| File | Description |
|---|---|
Tools/build/freeze_modules.py |
Adds encodings + key codec modules to the “startup, without site” frozen module list. |
Python/frozen.c |
Registers the new frozen modules (includes, externs, and stdlib_modules[] entries). |
Makefile.pre.in |
Adds the new inputs/outputs and per-module freeze targets for POSIX builds and deepfreeze dependency lists. |
PCbuild/_freeze_module.vcxproj |
Adds the new modules to the Windows frozen-header generation and deepfreeze command list. |
PCbuild/_freeze_module.vcxproj.filters |
Adds the new Python source entries to the Visual Studio filters list. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 0 new
Summary
Freeze the five encoding modules imported during every interpreter startup into
the CPython binary, eliminating filesystem I/O that is prohibitively slow on
Nanvix ramfs/microvm.
Motivation
During CPython startup, 25 modules are imported. Prior to this change, 20 were
frozen/builtin but the encodings package and its codecs required filesystem reads
from ramfs. On Nanvix each file open/read/close involves VM exits, adding
measurable latency to startup.
Changes
Tools/build/freeze_modules.py: Add encodings package + 4 codec modules to FROZEN listPython/frozen.c: Auto-regenerated with 5 new frozen module entriesMakefile.pre.in: Auto-regenerated with freeze targets for encoding modulesPCbuild/_freeze_module.vcxproj{,.filters}: Auto-regenerated for Windows buildsDesign Decision
Only the 5 startup-critical codecs are frozen (not all ~100 encoding modules):
encodings,encodings.aliases,encodings.ascii,encodings.latin_1, andencodings.utf_8. Binary growth is ~92 KB.Benchmark
Platform: microvm/standalone, 128 MB, Nanvix v0.12.364
Method: 20 A/B interleaved pairs on WSL2 (i9-12900H, KVM)
The improvement is modest on a fast KVM host (~80 ms total startup).
On production Nanvix deployments with higher VM-exit latency the saving
scales proportionally.
Supersedes the earlier revision of #317 which included pre-compiled .h
headers in the commit. This version relies on the build system to
generate them, matching upstream CPython conventions.