[nanvix] E: Phase 1B — build 5 Tier-1 math + memory modules as .so (with libm-unbundling)#6
Open
esaurez wants to merge 1 commit into
Open
Conversation
Phase 1B of the .a -> .so migration (see nanvix-todo/cpython-static-to-shared-migration.md section 5). Builds on Phase 1A (#5) by promoting the remaining 5 Tier-1 "math + memory" stdlib extension modules from statically linked into python.elf to dlopen-loaded shared objects. Modules moved to *shared* in Modules/Setup.local generation (.nanvix/docker.py): - math, cmath, _statistics (libm consumers) - mmap, _contextvars (libc-only) None of the five reference external libraries beyond libc / libm. ============================================================== libm is properly resolved through python.elf's .dynsym ============================================================== This PR also drops libm.a from the per-module .so link commands. Without this, cpython's Makefile attaches libm.a to math / cmath / _statistics via the MODULE_*_LDFLAGS path, producing duplicate libm copies in each .so plus python.elf: math.cpython-312.so pre-drop: 532 KB post-drop: 381 KB cmath.cpython-312.so pre-drop: 180 KB post-drop: 91 KB _statistics.cpython-312.so pre-drop: 28 KB post-drop: 18 KB python.elf size unchanged (libm.a is already linked via --whole-archive into the main binary; that hasn't changed). Total ramfs reduction: ~240 KB. Implemented by passing --with-libm= (empty) to CPython configure so the math / cmath / _statistics module link commands no longer include libm.a. Each .so ends up with sqrt / cbrt / etc. as undefined references that the dynamic loader resolves at dlopen time against python.elf's .dynsym. This is the canonical pattern for any Nanvix-on-static-libm CPython extension that uses libm: build the extension with no libm in its link command; at dlopen time the dynamic loader resolves libm names against python.elf's .dynsym, which has them as GLOBAL DEFAULT because libm.a is in python.elf's --whole-archive LIBS plus the libposix visibility-merge contamination has been fixed at the Nanvix level (esaurez/nanvix#26). This matches how Linux dlopen'd CPython math modules resolve libm — through the main executable's .dynsym (Linux uses libm.so.6 instead of the static link, but the resolution mechanism is the same). Future Phase 2/3 modules with libm dependencies automatically benefit from this without needing libm.a baked into each .so. ============================================================== Note on the 11 newlib-internal `__math_*` helpers ============================================================== `__math_invalid`, `__math_oflow`, `__math_uflow`, `__math_divzero`, etc. are GLOBAL HIDDEN at source in `newlib/libm/common/math_config.h` (ported from ARM optimized-routines, same as glibc and musl). They are deliberately library-private and **do not** appear in python.elf's .dynsym even with this PR — and that is correct, not a bug. They are called only from inside libm's own internals (e.g. __ieee754_sqrt -> __math_invalid), which live inside python.elf alongside the helpers themselves; the PC-relative call between them is resolved at static-link time and never touches .dynsym. dlopen'd modules don't need them — they call public names like sqrt, which dispatches to python.elf's sqrt, which internally calls __math_invalid if needed. This is identical to how Linux ships libm.so.6 with the same HIDDEN attribute on the same helpers; dlopen'd Python extensions on Linux never need to look them up either. ============================================================== Test coverage (.nanvix/test.py) ============================================================== New phase1b_snippet imports each module, asserts it is NOT in sys.builtin_module_names, exercises one trivial API call to confirm dlopen + PyInit_<name> succeeded, and prints the resolved __file__ path. Phase 1A probe retained. ============================================================== Prerequisites ============================================================== Requires esaurez/nanvix#26 (libposix compiler_builtins libm visibility fix) merged and the toolchain image rebuilt to carry the patched libposix.a. Without that, math.so dlopen would fail with "symbol not found" on sqrt — exactly the failure mode that motivated nanvix#26. ============================================================== Validation ============================================================== Tested on phase0-llfix toolchain overlay: - All 5 new .so files produced and installed under lib-dynload/. - nm python.elf no longer shows PyInit_<name> for any of the 5. - python.elf size: 19.31 MB (Phase 1A) -> 19.18 MB (Phase 1B), -130 KB. - readelf --dyn-syms python.elf shows sqrt / cbrt / fma as GLOBAL DEFAULT (the libposix-fix landed in nanvix#26 makes this work). - math.sqrt(4.0) == 2.0 and cmath.sqrt(-1) == 1j round-trip via dlopen. - Hello + Phase 1A probe + Phase 1B probe + lxml + HTTP smoke + full regrtest 160/160 PASS in standalone mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
daa0c44 to
b91f9d8
Compare
esaurez
pushed a commit
that referenced
this pull request
Jun 3, 2026
Phase 1C of the .a -> .so migration (see nanvix-todo/cpython-static-to-shared-migration.md section 5). Builds on Phase 1B (#6, #7) by promoting the remaining 8 Tier-1 "text codec" stdlib extension modules from statically linked into python.elf to dlopen-loaded shared objects under lib/python3.12/lib-dynload/. Modules moved to *shared* in Modules/Setup.local generation (.nanvix/docker.py): - unicodedata: Unicode database lookups (the big one — 1.2 MB of unicode data tables). - _multibytecodec: shared CJK codec infrastructure. - _codecs_cn / _codecs_hk / _codecs_iso2022 / _codecs_jp / _codecs_kr / _codecs_tw: per-region CJK codec tables. None of the eight reference external libraries; they are pure C with embedded data tables. They link against the same -lc / runtime symbols that the rest of the Phase 1 modules use. Test coverage (.nanvix/test.py): - New phase1c_snippet imports each module, asserts it is NOT in sys.builtin_module_names, exercises one trivial API call to confirm dlopen + PyInit_<name> succeeded (unicodedata.lookup, _multibytecodec.__create_codec, _codecs_<region>.getcodec), and prints the resolved __file__ path. Phase 1A/1B probes retained. Validation on local toolchain (phase0-llfix): - All 8 new .so files produced and installed under lib-dynload/ (unicodedata 1193K, _codecs_jp 262K, _codecs_hk 168K, _codecs_cn 155K, _codecs_kr 145K, _multibytecodec 147K, _codecs_tw 115K, _codecs_iso2022 76K — total ~2.2 MB across the eight files). - nm python.elf no longer shows PyInit_<name> for any of the 8. - python.elf size: 19.18 MB (Phase 1B) -> 17.48 MB (Phase 1C), -1.70 MB. Biggest single-phase reduction so far because the CJK codec tables and the Unicode database are large. - Hello + Phase 1A + Phase 1B + Phase 1C import probes + lxml + HTTP smoke + full regrtest 160/160 PASS in standalone mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1B of the
.a→.somigration. Promotes 5 Tier-1 "math + memory" stdlib extension modules from statically linked intopython.elfto dlopen-loaded shared objects:math,cmath,_statistics,mmap,_contextvars.This PR also drops
libm.afrom the per-module.solink commands — the canonical Linux-style pattern where each.socarries only its CPython glue and resolvessqrt/cbrt/ etc. againstpython.elf's.dynsymat dlopen time. The previously-separate "drop libm.a from math .so" PR (#7) is folded into this one per esaurez's preference for consolidated review.Modules moved to
*shared*math,cmath,_statisticsmmap,_contextvarslibm dlopen-resolution
Passes
--with-libm=(empty) to CPython'sconfigureso the math / cmath / _statistics.solink commands no longer includelibm.a. Each.soends up withsqrt/cbrt/ etc. as undefined references resolved at dlopen time againstpython.elf's.dynsym. python.elf has them asGLOBAL DEFAULTbecauselibm.ais in python.elf's--whole-archiveLIBS plus the libposix visibility-merge contamination is fixed at the Nanvix level (nanvix#26).This matches how Linux dlopen'd CPython math modules resolve libm — through the main executable's
.dynsym(Linux useslibm.so.6instead of the static link; the resolution mechanism is identical).Note on the 11 newlib-internal
__math_*helpers__math_invalid,__math_oflow,__math_uflow,__math_divzero, etc. are GLOBAL HIDDEN at source innewlib/libm/common/math_config.h(ported from ARM optimized-routines, same as glibc and musl). They are deliberately library-private and do not appear inpython.elf's.dynsym— and that is correct, not a bug. They are called only from inside libm's own internals (e.g.__ieee754_sqrt → __math_invalid), which live insidepython.elfalongside the helpers themselves; the PC-relative call between them is resolved at static-link time and never touches.dynsym. dlopen'd modules don't need them — they call public names likesqrt, which dispatches topython.elf'ssqrt, which internally calls__math_invalidif needed. This is identical to how Linux shipslibm.so.6with the same HIDDEN attribute on the same helpers; dlopen'd Python extensions on Linux never need to look them up either.Size impact
math.socmath.so_statistics.sommap.so_contextvars.sopython.elfTest coverage
New
phase1b_snippetin.nanvix/test.pyimports each module, asserts non-builtin status, exercises one trivial API call (math.sqrt(4.0),cmath.sqrt(-1), etc.), and prints__file__. Phase 1A probe retained.Validation
Tested on
phase0-llfixtoolchain overlay:.sofiles produced and installed underlib-dynload/.nm python.elfno longer showsPyInit_<name>for any of the 5.readelf --dyn-syms python.elfshowssqrt/cbrt/fmaasGLOBAL DEFAULT(the libposix-fix fromnanvix#26makes this work).math.sqrt(4.0) == 2.0andcmath.sqrt(-1) == 1jround-trip via dlopen.Prerequisites
Stacked on Phase 1A (esaurez/cpython#5). Requires
esaurez/nanvix#26merged AND the toolchain image rebuilt to carry the patchedlibposix.a. Without that fix,math.sodlopen would fail withImportError: symbol not foundonsqrt— exactly the failure mode that motivatednanvix#26.Risk
Mechanical configuration change — 5 entries added to
*shared*block ofSetup.local, one-line change inMakefile.nanvixto drop libm from per-module.soLDFLAGS. All tests verify behavior is unchanged.