[nanvix] E: Phase 1C — build 8 Tier-1 text-codec modules as .so#13
Open
esaurez wants to merge 1 commit into
Open
[nanvix] E: Phase 1C — build 8 Tier-1 text-codec modules as .so#13esaurez wants to merge 1 commit into
esaurez wants to merge 1 commit into
Conversation
Phase 1C of the .a -> .so migration (see nanvix-todo/cpython-static-to-shared-migration.md section 5). Builds on Phase 1B (#6, #7) by promoting the remaining 8 Tier-1 "text codec" stdlib extension modules from statically linked into python.elf to dlopen-loaded shared objects under lib/python3.12/lib-dynload/. Modules moved to *shared* in Modules/Setup.local generation (.nanvix/docker.py): - unicodedata: Unicode database lookups (the big one — 1.2 MB of unicode data tables). - _multibytecodec: shared CJK codec infrastructure. - _codecs_cn / _codecs_hk / _codecs_iso2022 / _codecs_jp / _codecs_kr / _codecs_tw: per-region CJK codec tables. None of the eight reference external libraries; they are pure C with embedded data tables. They link against the same -lc / runtime symbols that the rest of the Phase 1 modules use. Test coverage (.nanvix/test.py): - New phase1c_snippet imports each module, asserts it is NOT in sys.builtin_module_names, exercises one trivial API call to confirm dlopen + PyInit_<name> succeeded (unicodedata.lookup, _multibytecodec.__create_codec, _codecs_<region>.getcodec), and prints the resolved __file__ path. Phase 1A/1B probes retained. Validation on local toolchain (phase0-llfix): - All 8 new .so files produced and installed under lib-dynload/ (unicodedata 1193K, _codecs_jp 262K, _codecs_hk 168K, _codecs_cn 155K, _codecs_kr 145K, _multibytecodec 147K, _codecs_tw 115K, _codecs_iso2022 76K — total ~2.2 MB across the eight files). - nm python.elf no longer shows PyInit_<name> for any of the 8. - python.elf size: 19.18 MB (Phase 1B) -> 17.48 MB (Phase 1C), -1.70 MB. Biggest single-phase reduction so far because the CJK codec tables and the Unicode database are large. - Hello + Phase 1A + Phase 1B + Phase 1C import probes + lxml + HTTP smoke + full regrtest 160/160 PASS in standalone mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1C of the
.a→.somigration. Promotes the 8 Tier-1 "text codec" stdlib extension modules from statically linked intopython.elfto dlopen-loaded shared objects:unicodedata,_multibytecodec,_codecs_cn,_codecs_hk,_codecs_iso2022,_codecs_jp,_codecs_kr,_codecs_tw.Note: This PR replaces the original #8 which auto-closed when its base branch (
feat/phase1b-drop-libm-from-math-so) was deleted as part of folding the drop-libm work into PR #6 (per esaurez review preference for consolidating .so move + lib-resolution changes into single PRs).Size impact
python.elf: 19.18 MB → 17.48 MB (−1.70 MB, biggest single-phase reduction at the time).unicodedata.sois 1193 KB (Unicode database tables).Validation
Full regrtest 160/160 PASS + lxml + HTTP smoke + Phase 1A/1B/1C import probes.
Prerequisites
Stacked on Phase 1B (esaurez/cpython#6) which now includes the libm-unbundling formerly in PR #7. No other prereqs beyond what Phase 1 already requires.