[nanvix] E: Phase 2b -- unbundle cpython-vendored libmpdec/libexpat/libHacl_Hash_SHA2 via MODLIBS piggyback + visibility-default#17
Open
esaurez wants to merge 1 commit into
Conversation
…ibHacl_Hash_SHA2 via MODLIBS piggyback + visibility-default Unbundles the three vendored static libraries that cpython ships inside its own source tree (libmpdec for _decimal; libexpat for pyexpat + _elementtree; libHacl_Hash_SHA2 for _sha2). Each library lives once in python.elf via the standard MODLIBS piggyback machinery, and the corresponding Phase 2 .so modules resolve their symbols against python.elf .dynsym at dlopen time -- same model as the Group A sysroot ports (zlib/bz2/lzma/sqlite3 in cpython#10) and the Phase 3b .so-via-DT_NEEDED chains (libffi in cpython#14, openssl in cpython#15). Architecture (3 fixes wired together): 1. MODLIBS piggyback: append the three .a paths to the *static* _nanvix line in docker.py with -Wl,--whole-archive ... -Wl,--no-whole-archive, so makesetup pulls them into LOCALMODLIBS and the python.elf link rule includes them with --whole-archive (ensuring every .o is pulled, not just those referenced from python.elf code). 2. visibility-default flip: cpython's CONFIGURE_CFLAGS_NODIST adds -fvisibility=hidden globally, which makes the vendored libs' public symbols STV_HIDDEN even though their own pragma guards (mpdecimal.h, expat headers, HACL headers) are no-op on i686-nanvix (the libmpdec pragma is gated on __linux__ / __FreeBSD__ / __APPLE__, none of which Nanvix defines). Patch the generated Makefile to append -fvisibility=default to LIBMPDEC_CFLAGS, LIBEXPAT_CFLAGS, LIBHACL_CFLAGS so the last -fvisibility= flag wins. Symbols land as STV_DEFAULT, get exported by --export-dynamic, are visible in python.elf .dynsym. 3. drop per-module bundling: override MODULE__DECIMAL_LDFLAGS= and MODULE_PYEXPAT_LDFLAGS= at make time so the per-module .so links no longer include -lm + LIBMPDEC_A / LIBEXPAT_A. For _sha2, remove Modules/_hacl/libHacl_Hash_SHA2.a from its Setup.local line (it was the explicit bundling path). Sizes (cpython-dev integration branch): _decimal.cpython-312.so: 1632 KB -> 810 KB (-822 KB) pyexpat.cpython-312.so: 749 KB -> 197 KB (-552 KB) _sha2.cpython-312.so: 1809 KB -> 65 KB (-1744 KB) _elementtree: unchanged (uses pyexpat CAPI hook, no direct expat link) python.elf: 9976 KB -> 10346 KB (+370 KB; vendored libs land here once) Net ramfs savings: ~2.7 MB Validation: PHASE2_PASS: _sha2.sha256, _decimal multi-digit pi*e math, pyexpat XML parse, xml.etree (via _elementtree+pyexpat capsule) Regression: LXML_CHAIN_PASS + CTYPES_CHAIN_PASS + OPENSSL_CHAIN_PASS still pass. Research backing: see nanvix-todo/cpython-phase2-bundled-libs-hidden-visibility-blocker.md for the full investigation of why this approach was chosen (rather than --with-system-libmpdec, --with-system-expat, or objcopy --globalize-symbols). Short version: distros use --with-system-libmpdec but that REPLACES the bundled build (LIBMPDEC_INTERNAL= means the .a is not built at all); we want to KEEP the bundled .a but unbundle from the .so consumers. -fvisibility=default is the cleanest way to flip the symbol visibility for a specific set of .c files without rebuilding the world. Stacks on cpython#16 (drop -lssl -lcrypto -lffi from python.elf LIBS) which is the parent branch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced Jun 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Unbundles the three vendored static libraries that cpython ships inside its own source tree (
Modules/_decimal/libmpdec/,Modules/expat/,Modules/_hacl/). Each library lives once inpython.elfvia the standard MODLIBS piggyback machinery, and the corresponding Phase 2.somodules resolve their symbols againstpython.elf's.dynsymat dlopen time — same model as the Group A sysroot ports (zlib/bz2/lzma/sqlite3 in cpython#10) and the Phase 3b.so-via-DT_NEEDED chains (cpython#14 for libffi, cpython#15 for openssl).Size impact
_decimal.cpython-312.sopyexpat.cpython-312.so_sha2.cpython-312.so_elementtree.cpython-312.sopython.elfNet ramfs savings: ~2.7 MB.
Architecture (three fixes wired together)
1. MODLIBS piggyback
Append the three
.apaths to the*static* _nanvixline indocker.pywith-Wl,--whole-archive ... -Wl,--no-whole-archive. makesetup pulls them intoLOCALMODLIBSand thepython.elflink rule includes them with--whole-archive(ensuring every.ois pulled, not just those referenced from python.elf code).2. visibility-default flip
CONFIGURE_CFLAGS_NODISTadds-fvisibility=hiddenglobally, which makes the vendored libs' public symbolsSTV_HIDDEN— andSTV_HIDDENsymbols don't reach.dynsymeven with-Wl,--export-dynamic. The vendored libs' own pragma guards (mpdecimal.h, expat headers, HACL headers) are no-op on i686-nanvix (the libmpdec pragma is gated on__linux__ || __FreeBSD__ || __APPLE__, none of which Nanvix defines).Patch the generated Makefile to append
-fvisibility=defaulttoLIBMPDEC_CFLAGS,LIBEXPAT_CFLAGS,LIBHACL_CFLAGSso the last-fvisibility=flag wins. Symbols land asSTV_DEFAULT, get exported by--export-dynamic, are visible inpython.elf .dynsym.3. Drop per-module bundling
Override
MODULE__DECIMAL_LDFLAGS=andMODULE_PYEXPAT_LDFLAGS=at make time so the per-module.solinks no longer include-lm + LIBMPDEC_A / LIBEXPAT_A. For_sha2, removeModules/_hacl/libHacl_Hash_SHA2.afrom itsSetup.localline (it was the explicit bundling path).Why not
--with-system-libmpdec/--with-system-expat?Researched distro practices first (report archive). Every major distro (Fedora, Arch, FreeBSD, Homebrew) uses these
--with-system-*flags to consume a separately-built libmpdec/libexpat from the OS packages. But for our case those flags would:LIBMPDEC_INTERNAL=(empty), which means cpython stops building the vendored.aat all.libmpdec.so/libexpat.soat link time.We want to keep cpython's vendored build (no separate Nanvix port of libmpdec) but stop duplicating the
.ainto each consumer.so. The MODLIBS-piggyback + visibility-flip is the right pattern for that.objcopy --globalize-symbolswas a considered alternative but is more invasive (post-build binary surgery vs. compile-time flag).Validation
Regression-tested:
LXML_CHAIN_PASS+CTYPES_CHAIN_PASS+OPENSSL_CHAIN_PASSall still pass.Dependencies
-lssl -lcrypto -lffifrom python.elf. This PR stacks on its branch.No new nanvix or sysroot dependencies.
Notes
-fvisibility=defaultMakefile patch is targeted to the three specificLIB*_CFLAGSvariables only — the rest of cpython still compiles with-fvisibility=hiddenper upstream convention.mpdecimal.his upstream-controlled (vendored frombytereef.org/mpdecimal/) and only fires on Linux/FreeBSD/macOS; the visibility=default flip is the right complement on Nanvix.--with-system-libhacloption upstream (the lib is intended to be vendored), so the MODLIBS+visibility approach is the only available unbundling path for_sha2.