Skip to content

[nanvix] E: Phase 1A — build 11 Tier-1 data primitives as .so#5

Open
esaurez wants to merge 1 commit into
feat/phase0-array-sofrom
feat/phase1a-tier1-data-shared
Open

[nanvix] E: Phase 1A — build 11 Tier-1 data primitives as .so#5
esaurez wants to merge 1 commit into
feat/phase0-array-sofrom
feat/phase1a-tier1-data-shared

Conversation

@esaurez

@esaurez esaurez commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Summary

Phase 1A of the .a.so migration (see nanvix-todo/cpython-static-to-shared-migration.md section 5). Builds on Phase 0 (#690array) by promoting the remaining 11 Tier-1 "data primitive" stdlib extension modules from statically linked into python.elf to dlopen-loaded shared objects under lib/python3.12/lib-dynload/.

Modules moved to *shared*

Eleven new .so files, edited in .nanvix/docker.py's Modules/Setup.local generation:

  • Simple pure-C (5): _bisect, _heapq, _struct, _random, _opcode.
  • Medium (3): _queue, _csv, binascii.
  • Larger (3): _json, _pickle, _zoneinfo.

All eleven have no external library dependencies — verified by grepping their .c sources for #include <zlib|expat|openssl|sqlite|mpdec|bzlib|lzma|hacl>, none match. They need nothing beyond the same -lc / -lm symbols that array.so already pulls from python.elf's --whole-archive --export-dynamic main binary.

Test coverage

  • New phase1a_snippet in .nanvix/test.py imports each module, asserts it is NOT in sys.builtin_module_names, exercises one trivial API call to confirm dlopen + PyInit_<name> succeeded, and prints the resolved __file__ path so the host-side regex sees the .so came from lib-dynload/.
  • The smoke test, lxml import, HTTP server smoke, and full regrtest (160/160 modules) all continue to pass.

Validation

Tested locally on the phase0-llfix toolchain overlay (phase0-stable + the newlib %lld printf fix from nanvix/newlib#14; see nanvix-todo/newlib-z-missing-io-long-long-flag.md):

  • All 11 new .so files produced (~30–230 KB each) and installed under lib/python3.12/lib-dynload/.
  • nm python.elf no longer shows PyInit_<name> for any of the 11 modules; the symbols moved to their respective .so files (12/12 OK including array).
  • python.elf size: 19.97 MB (Phase 0) → 19.31 MB (Phase 1A), ~660 KB net reduction.
  • Hello-world + Phase 1A import probe + lxml + HTTP smoke + full regrtest 160/160 PASS in standalone mode.

Prerequisites

Stacked on top of Phase 0 (esaurez/cpython#? → eventually nanvix/cpython#690). This PR's base is feat/phase0-array-so so the diff is only the 11 new modules; rebase onto nanvix/v3.12.3 once Phase 0 merges. See nanvix-todo/open-pr-merge-order.md for the full cross-repo merge order. No additional newlib / gcc / nanvix PRs are needed beyond what Phase 0 requires.

Risk

Mechanical configuration change only — adds 11 entries to the *shared* block of Setup.local. Each module's source is unchanged. If any module fails to load at runtime, the failure is isolated (an ImportError on that specific module); the rest of the interpreter is unaffected.

Phase 1A of the .a -> .so migration (see
nanvix-todo/cpython-static-to-shared-migration.md section 5).
Builds on Phase 0 (#690 array) by promoting the remaining 11
Tier-1 "data primitives" stdlib extension modules from statically
linked into python.elf to dlopen-loaded shared objects under
lib/python3.12/lib-dynload/.

Modules moved to *shared* in Modules/Setup.local generation
(.nanvix/docker.py):

- _bisect, _heapq, _struct, _random, _opcode (5 simple pure-C)
- _queue, _csv, binascii (3 medium)
- _json, _pickle, _zoneinfo (3 larger)

All eleven modules have no external library dependencies (verified
by grepping their .c sources for #include <zlib|expat|openssl|sqlite|
mpdec|bzlib|lzma|hacl> — none match), so they need nothing beyond
the same -lc / -lm symbols that array.so already pulls from
python.elf's --whole-archive --export-dynamic main binary.

Test coverage (.nanvix/test.py):

- New phase1a_snippet imports each module, asserts it is NOT in
  sys.builtin_module_names, exercises one trivial API call to
  confirm dlopen + PyInit_<name> succeeded, and prints the
  resolved __file__ path so the host-side regex sees the .so
  came from lib-dynload/.
- Smoke test, lxml import, HTTP server smoke, and full regrtest
  (160/160 modules) all continue to pass.

Validation on local toolchain (phase0-llfix overlay of
phase0-stable with newlib %lld printf fix applied; see
nanvix-todo/newlib-z-missing-io-long-long-flag.md):

- All 11 new .so files produced (~30-230 KB each) and installed
  under lib/python3.12/lib-dynload/.
- nm python.elf no longer shows PyInit_<name> for any of the 11
  modules; the symbols moved to their respective .so files.
- python.elf size: 19.97 MB (Phase 0) -> 19.31 MB (Phase 1A),
  ~660 KB net reduction.
- Hello-world + Phase 1A import probe + lxml + HTTP smoke + full
  regrtest 160/160 PASS in standalone mode.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
esaurez pushed a commit that referenced this pull request Jun 3, 2026
Phase 1B of the .a -> .so migration (see
nanvix-todo/cpython-static-to-shared-migration.md section 5).
Builds on Phase 1A (#5) by promoting the remaining 5 Tier-1
"math + memory" stdlib extension modules from statically linked
into python.elf to dlopen-loaded shared objects.

Modules moved to *shared* in Modules/Setup.local generation
(.nanvix/docker.py):

- math, cmath, _statistics (libm consumers)
- mmap, _contextvars (libc-only)

None of the five reference external libraries beyond libc / libm.

==============================================================
libm is properly resolved through python.elf's .dynsym
==============================================================

This PR also drops libm.a from the per-module .so link
commands. Without this, cpython's Makefile attaches libm.a to
math / cmath / _statistics via the MODULE_*_LDFLAGS path,
producing duplicate libm copies in each .so plus python.elf:

  math.cpython-312.so      pre-drop:  532 KB    post-drop: 381 KB
  cmath.cpython-312.so     pre-drop:  180 KB    post-drop:  91 KB
  _statistics.cpython-312.so pre-drop:  28 KB   post-drop:  18 KB

  python.elf size unchanged (libm.a is already linked via
  --whole-archive into the main binary; that hasn't changed).
  Total ramfs reduction: ~240 KB.

Implemented by passing --with-libm= (empty) to CPython configure
so the math / cmath / _statistics module link commands no longer
include libm.a. Each .so ends up with sqrt / cbrt / etc. as
undefined references that the dynamic loader resolves at dlopen
time against python.elf's .dynsym.

This is the canonical pattern for any Nanvix-on-static-libm
CPython extension that uses libm: build the extension with no
libm in its link command; at dlopen time the dynamic loader
resolves libm names against python.elf's .dynsym, which has them
as GLOBAL DEFAULT because libm.a is in python.elf's
--whole-archive LIBS plus the libposix visibility-merge
contamination has been fixed at the Nanvix level
(esaurez/nanvix#26). This matches how Linux dlopen'd CPython
math modules resolve libm — through the main executable's
.dynsym (Linux uses libm.so.6 instead of the static link, but
the resolution mechanism is the same). Future Phase 2/3 modules
with libm dependencies automatically benefit from this without
needing libm.a baked into each .so.

==============================================================
Note on the 11 newlib-internal `__math_*` helpers
==============================================================

`__math_invalid`, `__math_oflow`, `__math_uflow`, `__math_divzero`,
etc. are GLOBAL HIDDEN at source in
`newlib/libm/common/math_config.h` (ported from ARM
optimized-routines, same as glibc and musl). They are
deliberately library-private and **do not** appear in
python.elf's .dynsym even with this PR — and that is correct,
not a bug. They are called only from inside libm's own
internals (e.g. __ieee754_sqrt -> __math_invalid), which live
inside python.elf alongside the helpers themselves; the
PC-relative call between them is resolved at static-link time
and never touches .dynsym. dlopen'd modules don't need them —
they call public names like sqrt, which dispatches to
python.elf's sqrt, which internally calls __math_invalid if
needed. This is identical to how Linux ships libm.so.6 with the
same HIDDEN attribute on the same helpers; dlopen'd Python
extensions on Linux never need to look them up either.

==============================================================
Test coverage (.nanvix/test.py)
==============================================================

New phase1b_snippet imports each module, asserts it is NOT in
sys.builtin_module_names, exercises one trivial API call to
confirm dlopen + PyInit_<name> succeeded, and prints the
resolved __file__ path. Phase 1A probe retained.

==============================================================
Prerequisites
==============================================================

Requires esaurez/nanvix#26 (libposix compiler_builtins libm
visibility fix) merged and the toolchain image rebuilt to carry
the patched libposix.a. Without that, math.so dlopen would fail
with "symbol not found" on sqrt — exactly the failure mode that
motivated nanvix#26.

==============================================================
Validation
==============================================================

Tested on phase0-llfix toolchain overlay:

- All 5 new .so files produced and installed under lib-dynload/.
- nm python.elf no longer shows PyInit_<name> for any of the 5.
- python.elf size: 19.31 MB (Phase 1A) -> 19.18 MB (Phase 1B),
  -130 KB.
- readelf --dyn-syms python.elf shows sqrt / cbrt / fma as
  GLOBAL DEFAULT (the libposix-fix landed in nanvix#26 makes
  this work).
- math.sqrt(4.0) == 2.0 and cmath.sqrt(-1) == 1j round-trip via
  dlopen.
- Hello + Phase 1A probe + Phase 1B probe + lxml + HTTP smoke +
  full regrtest 160/160 PASS in standalone mode.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant