Skip to content

[nanvix] E: Phase 2b -- unbundle cpython-vendored libmpdec/libexpat/libHacl_Hash_SHA2 via MODLIBS piggyback + visibility-default#17

Open
esaurez wants to merge 1 commit into
feat/drop-redundant-libs-from-python-elffrom
feat/phase2b-unbundle-via-visibility-flip
Open

[nanvix] E: Phase 2b -- unbundle cpython-vendored libmpdec/libexpat/libHacl_Hash_SHA2 via MODLIBS piggyback + visibility-default#17
esaurez wants to merge 1 commit into
feat/drop-redundant-libs-from-python-elffrom
feat/phase2b-unbundle-via-visibility-flip

Conversation

@esaurez

@esaurez esaurez commented Jun 6, 2026

Copy link
Copy Markdown
Owner

Summary

Unbundles the three vendored static libraries that cpython ships inside its own source tree (Modules/_decimal/libmpdec/, Modules/expat/, Modules/_hacl/). Each library lives once in python.elf via the standard MODLIBS piggyback machinery, and the corresponding Phase 2 .so modules resolve their symbols against python.elf's .dynsym at dlopen time — same model as the Group A sysroot ports (zlib/bz2/lzma/sqlite3 in cpython#10) and the Phase 3b .so-via-DT_NEEDED chains (cpython#14 for libffi, cpython#15 for openssl).

Size impact

Module Before (bundled) After (unbundled) Saved
_decimal.cpython-312.so 1632 KB 810 KB 822 KB
pyexpat.cpython-312.so 749 KB 197 KB 552 KB
_sha2.cpython-312.so 1809 KB 65 KB 1744 KB
_elementtree.cpython-312.so 347 KB unchanged (uses pyexpat CAPI hook, no direct expat link)
python.elf 9976 KB 10346 KB +370 KB (vendored libs land here once)

Net ramfs savings: ~2.7 MB.

Architecture (three fixes wired together)

1. MODLIBS piggyback

Append the three .a paths to the *static* _nanvix line in docker.py with -Wl,--whole-archive ... -Wl,--no-whole-archive. makesetup pulls them into LOCALMODLIBS and the python.elf link rule includes them with --whole-archive (ensuring every .o is pulled, not just those referenced from python.elf code).

2. visibility-default flip

CONFIGURE_CFLAGS_NODIST adds -fvisibility=hidden globally, which makes the vendored libs' public symbols STV_HIDDEN — and STV_HIDDEN symbols don't reach .dynsym even with -Wl,--export-dynamic. The vendored libs' own pragma guards (mpdecimal.h, expat headers, HACL headers) are no-op on i686-nanvix (the libmpdec pragma is gated on __linux__ || __FreeBSD__ || __APPLE__, none of which Nanvix defines).

Patch the generated Makefile to append -fvisibility=default to LIBMPDEC_CFLAGS, LIBEXPAT_CFLAGS, LIBHACL_CFLAGS so the last -fvisibility= flag wins. Symbols land as STV_DEFAULT, get exported by --export-dynamic, are visible in python.elf .dynsym.

$ nm -D python.elf | grep -cE ' T mpd_'
235
$ nm -D python.elf | grep -cE ' T PyExpat_XML_'
69
$ nm -D python.elf | grep -cE ' T python_hashlib_Hacl_Hash_SHA2'
18

3. Drop per-module bundling

Override MODULE__DECIMAL_LDFLAGS= and MODULE_PYEXPAT_LDFLAGS= at make time so the per-module .so links no longer include -lm + LIBMPDEC_A / LIBEXPAT_A. For _sha2, remove Modules/_hacl/libHacl_Hash_SHA2.a from its Setup.local line (it was the explicit bundling path).

Why not --with-system-libmpdec / --with-system-expat?

Researched distro practices first (report archive). Every major distro (Fedora, Arch, FreeBSD, Homebrew) uses these --with-system-* flags to consume a separately-built libmpdec/libexpat from the OS packages. But for our case those flags would:

  • Set LIBMPDEC_INTERNAL= (empty), which means cpython stops building the vendored .a at all.
  • Expect a system libmpdec.so / libexpat.so at link time.

We want to keep cpython's vendored build (no separate Nanvix port of libmpdec) but stop duplicating the .a into each consumer .so. The MODLIBS-piggyback + visibility-flip is the right pattern for that. objcopy --globalize-symbols was a considered alternative but is more invasive (post-build binary surgery vs. compile-time flag).

Validation

STEP_1:python_started (3, 12, 3)
STEP_2:_sha2 imported /lib/python3.12/lib-dynload/_sha2.cpython-312.so
STEP_3:_sha2.sha256 digest=a7f9173ca23c3e49...
STEP_4:_decimal imported /lib/python3.12/lib-dynload/_decimal.cpython-312.so
STEP_5:_decimal math: pi*e = 8.539734222673567065455462291
STEP_6:pyexpat imported /lib/python3.12/lib-dynload/pyexpat.cpython-312.so
STEP_7:pyexpat parsed: ['root', 'child']
STEP_8:_elementtree imported /lib/python3.12/lib-dynload/_elementtree.cpython-312.so
STEP_9:xml.etree parsed: root=root child=child a=1
PHASE2_PASS

Regression-tested: LXML_CHAIN_PASS + CTYPES_CHAIN_PASS + OPENSSL_CHAIN_PASS all still pass.

Dependencies

No new nanvix or sysroot dependencies.

Notes

  • The -fvisibility=default Makefile patch is targeted to the three specific LIB*_CFLAGS variables only — the rest of cpython still compiles with -fvisibility=hidden per upstream convention.
  • The pragma in mpdecimal.h is upstream-controlled (vendored from bytereef.org/mpdecimal/) and only fires on Linux/FreeBSD/macOS; the visibility=default flip is the right complement on Nanvix.
  • HACL has no --with-system-libhacl option upstream (the lib is intended to be vendored), so the MODLIBS+visibility approach is the only available unbundling path for _sha2.

…ibHacl_Hash_SHA2 via MODLIBS piggyback + visibility-default

Unbundles the three vendored static libraries that cpython ships inside its own source tree (libmpdec for _decimal; libexpat for pyexpat + _elementtree; libHacl_Hash_SHA2 for _sha2).  Each library lives once in python.elf via the standard MODLIBS piggyback machinery, and the corresponding Phase 2 .so modules resolve their symbols against python.elf .dynsym at dlopen time -- same model as the Group A sysroot ports (zlib/bz2/lzma/sqlite3 in cpython#10) and the Phase 3b .so-via-DT_NEEDED chains (libffi in cpython#14, openssl in cpython#15).

Architecture (3 fixes wired together):

1. MODLIBS piggyback: append the three .a paths to the *static* _nanvix line in docker.py with -Wl,--whole-archive ... -Wl,--no-whole-archive, so makesetup pulls them into LOCALMODLIBS and the python.elf link rule includes them with --whole-archive (ensuring every .o is pulled, not just those referenced from python.elf code).

2. visibility-default flip: cpython's CONFIGURE_CFLAGS_NODIST adds -fvisibility=hidden globally, which makes the vendored libs' public symbols STV_HIDDEN even though their own pragma guards (mpdecimal.h, expat headers, HACL headers) are no-op on i686-nanvix (the libmpdec pragma is gated on __linux__ / __FreeBSD__ / __APPLE__, none of which Nanvix defines).  Patch the generated Makefile to append -fvisibility=default to LIBMPDEC_CFLAGS, LIBEXPAT_CFLAGS, LIBHACL_CFLAGS so the last -fvisibility= flag wins.  Symbols land as STV_DEFAULT, get exported by --export-dynamic, are visible in python.elf .dynsym.

3. drop per-module bundling: override MODULE__DECIMAL_LDFLAGS= and MODULE_PYEXPAT_LDFLAGS= at make time so the per-module .so links no longer include -lm + LIBMPDEC_A / LIBEXPAT_A.  For _sha2, remove Modules/_hacl/libHacl_Hash_SHA2.a from its Setup.local line (it was the explicit bundling path).

Sizes (cpython-dev integration branch):

  _decimal.cpython-312.so:  1632 KB -> 810 KB   (-822 KB)

  pyexpat.cpython-312.so:    749 KB -> 197 KB   (-552 KB)

  _sha2.cpython-312.so:     1809 KB ->  65 KB   (-1744 KB)

  _elementtree:              unchanged (uses pyexpat CAPI hook, no direct expat link)

  python.elf:               9976 KB -> 10346 KB  (+370 KB; vendored libs land here once)

  Net ramfs savings: ~2.7 MB

Validation:

  PHASE2_PASS: _sha2.sha256, _decimal multi-digit pi*e math, pyexpat XML parse, xml.etree (via _elementtree+pyexpat capsule)

  Regression: LXML_CHAIN_PASS + CTYPES_CHAIN_PASS + OPENSSL_CHAIN_PASS still pass.

Research backing: see nanvix-todo/cpython-phase2-bundled-libs-hidden-visibility-blocker.md for the full investigation of why this approach was chosen (rather than --with-system-libmpdec, --with-system-expat, or objcopy --globalize-symbols).  Short version: distros use --with-system-libmpdec but that REPLACES the bundled build (LIBMPDEC_INTERNAL= means the .a is not built at all); we want to KEEP the bundled .a but unbundle from the .so consumers.  -fvisibility=default is the cleanest way to flip the symbol visibility for a specific set of .c files without rebuilding the world.

Stacks on cpython#16 (drop -lssl -lcrypto -lffi from python.elf LIBS) which is the parent branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant