[nanvix] E: Phase 1A — build 11 Tier-1 data primitives as .so#5
Open
esaurez wants to merge 1 commit into
Open
Conversation
Phase 1A of the .a -> .so migration (see nanvix-todo/cpython-static-to-shared-migration.md section 5). Builds on Phase 0 (#690 array) by promoting the remaining 11 Tier-1 "data primitives" stdlib extension modules from statically linked into python.elf to dlopen-loaded shared objects under lib/python3.12/lib-dynload/. Modules moved to *shared* in Modules/Setup.local generation (.nanvix/docker.py): - _bisect, _heapq, _struct, _random, _opcode (5 simple pure-C) - _queue, _csv, binascii (3 medium) - _json, _pickle, _zoneinfo (3 larger) All eleven modules have no external library dependencies (verified by grepping their .c sources for #include <zlib|expat|openssl|sqlite| mpdec|bzlib|lzma|hacl> — none match), so they need nothing beyond the same -lc / -lm symbols that array.so already pulls from python.elf's --whole-archive --export-dynamic main binary. Test coverage (.nanvix/test.py): - New phase1a_snippet imports each module, asserts it is NOT in sys.builtin_module_names, exercises one trivial API call to confirm dlopen + PyInit_<name> succeeded, and prints the resolved __file__ path so the host-side regex sees the .so came from lib-dynload/. - Smoke test, lxml import, HTTP server smoke, and full regrtest (160/160 modules) all continue to pass. Validation on local toolchain (phase0-llfix overlay of phase0-stable with newlib %lld printf fix applied; see nanvix-todo/newlib-z-missing-io-long-long-flag.md): - All 11 new .so files produced (~30-230 KB each) and installed under lib/python3.12/lib-dynload/. - nm python.elf no longer shows PyInit_<name> for any of the 11 modules; the symbols moved to their respective .so files. - python.elf size: 19.97 MB (Phase 0) -> 19.31 MB (Phase 1A), ~660 KB net reduction. - Hello-world + Phase 1A import probe + lxml + HTTP smoke + full regrtest 160/160 PASS in standalone mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
esaurez
pushed a commit
that referenced
this pull request
Jun 3, 2026
Phase 1B of the .a -> .so migration (see nanvix-todo/cpython-static-to-shared-migration.md section 5). Builds on Phase 1A (#5) by promoting the remaining 5 Tier-1 "math + memory" stdlib extension modules from statically linked into python.elf to dlopen-loaded shared objects. Modules moved to *shared* in Modules/Setup.local generation (.nanvix/docker.py): - math, cmath, _statistics (libm consumers) - mmap, _contextvars (libc-only) None of the five reference external libraries beyond libc / libm. ============================================================== libm is properly resolved through python.elf's .dynsym ============================================================== This PR also drops libm.a from the per-module .so link commands. Without this, cpython's Makefile attaches libm.a to math / cmath / _statistics via the MODULE_*_LDFLAGS path, producing duplicate libm copies in each .so plus python.elf: math.cpython-312.so pre-drop: 532 KB post-drop: 381 KB cmath.cpython-312.so pre-drop: 180 KB post-drop: 91 KB _statistics.cpython-312.so pre-drop: 28 KB post-drop: 18 KB python.elf size unchanged (libm.a is already linked via --whole-archive into the main binary; that hasn't changed). Total ramfs reduction: ~240 KB. Implemented by passing --with-libm= (empty) to CPython configure so the math / cmath / _statistics module link commands no longer include libm.a. Each .so ends up with sqrt / cbrt / etc. as undefined references that the dynamic loader resolves at dlopen time against python.elf's .dynsym. This is the canonical pattern for any Nanvix-on-static-libm CPython extension that uses libm: build the extension with no libm in its link command; at dlopen time the dynamic loader resolves libm names against python.elf's .dynsym, which has them as GLOBAL DEFAULT because libm.a is in python.elf's --whole-archive LIBS plus the libposix visibility-merge contamination has been fixed at the Nanvix level (esaurez/nanvix#26). This matches how Linux dlopen'd CPython math modules resolve libm — through the main executable's .dynsym (Linux uses libm.so.6 instead of the static link, but the resolution mechanism is the same). Future Phase 2/3 modules with libm dependencies automatically benefit from this without needing libm.a baked into each .so. ============================================================== Note on the 11 newlib-internal `__math_*` helpers ============================================================== `__math_invalid`, `__math_oflow`, `__math_uflow`, `__math_divzero`, etc. are GLOBAL HIDDEN at source in `newlib/libm/common/math_config.h` (ported from ARM optimized-routines, same as glibc and musl). They are deliberately library-private and **do not** appear in python.elf's .dynsym even with this PR — and that is correct, not a bug. They are called only from inside libm's own internals (e.g. __ieee754_sqrt -> __math_invalid), which live inside python.elf alongside the helpers themselves; the PC-relative call between them is resolved at static-link time and never touches .dynsym. dlopen'd modules don't need them — they call public names like sqrt, which dispatches to python.elf's sqrt, which internally calls __math_invalid if needed. This is identical to how Linux ships libm.so.6 with the same HIDDEN attribute on the same helpers; dlopen'd Python extensions on Linux never need to look them up either. ============================================================== Test coverage (.nanvix/test.py) ============================================================== New phase1b_snippet imports each module, asserts it is NOT in sys.builtin_module_names, exercises one trivial API call to confirm dlopen + PyInit_<name> succeeded, and prints the resolved __file__ path. Phase 1A probe retained. ============================================================== Prerequisites ============================================================== Requires esaurez/nanvix#26 (libposix compiler_builtins libm visibility fix) merged and the toolchain image rebuilt to carry the patched libposix.a. Without that, math.so dlopen would fail with "symbol not found" on sqrt — exactly the failure mode that motivated nanvix#26. ============================================================== Validation ============================================================== Tested on phase0-llfix toolchain overlay: - All 5 new .so files produced and installed under lib-dynload/. - nm python.elf no longer shows PyInit_<name> for any of the 5. - python.elf size: 19.31 MB (Phase 1A) -> 19.18 MB (Phase 1B), -130 KB. - readelf --dyn-syms python.elf shows sqrt / cbrt / fma as GLOBAL DEFAULT (the libposix-fix landed in nanvix#26 makes this work). - math.sqrt(4.0) == 2.0 and cmath.sqrt(-1) == 1j round-trip via dlopen. - Hello + Phase 1A probe + Phase 1B probe + lxml + HTTP smoke + full regrtest 160/160 PASS in standalone mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1A of the
.a→.somigration (seenanvix-todo/cpython-static-to-shared-migration.mdsection 5). Builds on Phase 0 (#690 —array) by promoting the remaining 11 Tier-1 "data primitive" stdlib extension modules from statically linked intopython.elfto dlopen-loaded shared objects underlib/python3.12/lib-dynload/.Modules moved to
*shared*Eleven new
.sofiles, edited in.nanvix/docker.py'sModules/Setup.localgeneration:_bisect,_heapq,_struct,_random,_opcode._queue,_csv,binascii._json,_pickle,_zoneinfo.All eleven have no external library dependencies — verified by grepping their
.csources for#include <zlib|expat|openssl|sqlite|mpdec|bzlib|lzma|hacl>, none match. They need nothing beyond the same-lc/-lmsymbols thatarray.soalready pulls frompython.elf's--whole-archive--export-dynamicmain binary.Test coverage
phase1a_snippetin.nanvix/test.pyimports each module, asserts it is NOT insys.builtin_module_names, exercises one trivial API call to confirmdlopen+PyInit_<name>succeeded, and prints the resolved__file__path so the host-side regex sees the.socame fromlib-dynload/.Validation
Tested locally on the
phase0-llfixtoolchain overlay (phase0-stable+ the newlib%lldprintf fix from nanvix/newlib#14; seenanvix-todo/newlib-z-missing-io-long-long-flag.md):.sofiles produced (~30–230 KB each) and installed underlib/python3.12/lib-dynload/.nm python.elfno longer showsPyInit_<name>for any of the 11 modules; the symbols moved to their respective.sofiles (12/12 OK includingarray).python.elfsize: 19.97 MB (Phase 0) → 19.31 MB (Phase 1A), ~660 KB net reduction.Prerequisites
Stacked on top of Phase 0 (esaurez/cpython#? → eventually
nanvix/cpython#690). This PR's base isfeat/phase0-array-soso the diff is only the 11 new modules; rebase ontonanvix/v3.12.3once Phase 0 merges. Seenanvix-todo/open-pr-merge-order.mdfor the full cross-repo merge order. No additional newlib / gcc / nanvix PRs are needed beyond what Phase 0 requires.Risk
Mechanical configuration change only — adds 11 entries to the
*shared*block ofSetup.local. Each module's source is unchanged. If any module fails to load at runtime, the failure is isolated (anImportErroron that specific module); the rest of the interpreter is unaffected.