[nanvix] E: Phase 1C — build 8 Tier-1 text-codec modules as .so#8
Closed
esaurez wants to merge 1 commit into
Closed
[nanvix] E: Phase 1C — build 8 Tier-1 text-codec modules as .so#8esaurez wants to merge 1 commit into
esaurez wants to merge 1 commit into
Conversation
Phase 1C of the .a -> .so migration (see nanvix-todo/cpython-static-to-shared-migration.md section 5). Builds on Phase 1B (#6, #7) by promoting the remaining 8 Tier-1 "text codec" stdlib extension modules from statically linked into python.elf to dlopen-loaded shared objects under lib/python3.12/lib-dynload/. Modules moved to *shared* in Modules/Setup.local generation (.nanvix/docker.py): - unicodedata: Unicode database lookups (the big one — 1.2 MB of unicode data tables). - _multibytecodec: shared CJK codec infrastructure. - _codecs_cn / _codecs_hk / _codecs_iso2022 / _codecs_jp / _codecs_kr / _codecs_tw: per-region CJK codec tables. None of the eight reference external libraries; they are pure C with embedded data tables. They link against the same -lc / runtime symbols that the rest of the Phase 1 modules use. Test coverage (.nanvix/test.py): - New phase1c_snippet imports each module, asserts it is NOT in sys.builtin_module_names, exercises one trivial API call to confirm dlopen + PyInit_<name> succeeded (unicodedata.lookup, _multibytecodec.__create_codec, _codecs_<region>.getcodec), and prints the resolved __file__ path. Phase 1A/1B probes retained. Validation on local toolchain (phase0-llfix): - All 8 new .so files produced and installed under lib-dynload/ (unicodedata 1193K, _codecs_jp 262K, _codecs_hk 168K, _codecs_cn 155K, _codecs_kr 145K, _multibytecodec 147K, _codecs_tw 115K, _codecs_iso2022 76K — total ~2.2 MB across the eight files). - nm python.elf no longer shows PyInit_<name> for any of the 8. - python.elf size: 19.18 MB (Phase 1B) -> 17.48 MB (Phase 1C), -1.70 MB. Biggest single-phase reduction so far because the CJK codec tables and the Unicode database are large. - Hello + Phase 1A + Phase 1B + Phase 1C import probes + lxml + HTTP smoke + full regrtest 160/160 PASS in standalone mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1C of the
.a→.somigration (seenanvix-todo/cpython-static-to-shared-migration.mdsection 5). Promotes the 8 Tier-1 "text codec" stdlib extension modules from statically linked intopython.elfto dlopen-loaded shared objects underlib/python3.12/lib-dynload/. Completes Phase 1 (25 modules total across 1A + 1B + 1C).Modules moved to
*shared*unicodedata_codecs_jp_codecs_hk_codecs_cn_codecs_kr_multibytecodec_codecs_tw_codecs_iso2022None reference external libraries — pure C with embedded data tables.
Test coverage
New
phase1c_snippetin.nanvix/test.pyimports each module, asserts non-builtin status, exercises one trivial API (unicodedata.lookup,_multibytecodec.__create_codec,_codecs_<region>.getcodec), and prints__file__. Phase 1A/1B probes retained.Validation
Tested on
phase0-llfixtoolchain overlay:.sofiles installed underlib-dynload/.nm python.elfno longer showsPyInit_<name>for any of the 8.python.elfsize: 19.18 MB (Phase 1B) → 17.48 MB (Phase 1C), −1.70 MB. Biggest single-phase reduction so far thanks to Unicode database + CJK codec tables.Prerequisites
Stacked on Phase 1B-drop-libm (esaurez/cpython#7). No additional newlib / gcc / nanvix PRs needed.
Risk
Mechanical configuration change — 8 entries added to
*shared*block ofSetup.local. Same pattern as Phase 1A/1B.