[nanvix] E: Build stdlib extensions as .so#732
Open
esaurez wants to merge 2 commits into
Open
Conversation
Builds 39 CPython stdlib extension modules as runtime-loaded .so files under lib/python3.12/lib-dynload/<name>.cpython-312.so instead of statically linking them into python.elf. Generalises the *shared* Setup.local flow first proven by feat/phase0-array-so. Modules built as .so -------------------- - Data primitives (no external deps): _bisect, _heapq, _struct, _random, _opcode, _queue, _csv, binascii, _json, _pickle, _zoneinfo. - Math and memory (libm symbols resolved against python.elf): math, cmath, _statistics, mmap, _contextvars. - Text codecs (no external deps): unicodedata, _multibytecodec, _codecs_cn/hk/iso2022/jp/kr/tw. - With bundled-in-cpython C deps (each .so bundles its own .a, same as upstream cpython's default ./configure run): _asyncio, _datetime, _decimal, pyexpat, _elementtree, _md5, _sha1, _sha2, _sha3, _blake2, select, _socket, _posixsubprocess, fcntl, termios. For the modules with bundled-in-cpython C deps (_decimal, pyexpat, _sha2), each .so bundles its own copy of the vendored .a via cpython's normal MODULE_*_LDFLAGS machinery -- exactly the behavior of cpython's default ./configure run on Linux without --with-system-libmpdec / --with-system-expat. mpdec / expat / libHacl_Hash_SHA2 are stateless C APIs with at most one consumer each (pyexpat exposes a PyCapsule that _elementtree calls through, so libexpat lives in pyexpat.so only), so duplication into python.elf would buy nothing. --with-libm= is cleared so libm symbols stay in python.elf -- math / cmath resolve them via the existing --whole-archive of libm.a in LIBNVX_CRT0. Infrastructure -------------- - .nanvix/setup_local.py (new): SETUP_LOCAL_ENTRIES data table + render_setup_local() -- single source of truth for Modules/Setup.local, consumed by both the host (.nanvix/lxml.py) and Docker (.nanvix/docker.py) build paths. - .nanvix/test.py: _SO_MODULE_SANITY_CHECKS table + _render_so_sanity_snippets() emit smoke-test snippets that exercise every migrated module via import + a trivial method call and assert each module is no longer in sys.builtin_module_names. - Makefile.nanvix: --with-libm= cleared (Phase 1B math/cmath modules resolve libm symbols against python.elf .dynsym at dlopen time via the existing --whole-archive of libm.a in LIBNVX_CRT0). Runtime dependencies (already merged upstream) ---------------------------------------------- - nanvix/nanvix#2472 -- libm visibility fix. - nanvix/nanvix#2473 -- dlfcn init-array + DT_RUNPATH support. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…FLAGS When --with-system-libmpdec / --with-system-expat is *not* given, the bundled-library code paths in configure.ac (lines 3831, 3915) hardcoded a literal `-lm` in: LIBEXPAT_LDFLAGS="-lm $(LIBEXPAT_A)" LIBMPDEC_LDFLAGS="-lm $(LIBMPDEC_A)" On Linux this happens to work because LIBM defaults to `-lm`, so the result is a no-op. On Nanvix we pass `--with-libm=` (empty) to keep libm.a out of every Setup.local `.so` -- libm is whole-archived into python.elf instead. The literal `-lm` defeats that and bundles a full copy of libm.a into both `_decimal.so` and `pyexpat.so` -- about ~400 KB of redundant code per .so, plus a symbol-collision risk that forces --allow-multiple-definition at every other .so link. Switching the literal to `$(LIBM)` is equivalent on Linux (LIBM=-lm by default) and correctly drops the duplicate libm when LIBM is empty. Both `configure.ac` and the generated `configure` are patched in lockstep so no autoreconf is required. Bug surfaced by the cpython-on-Nanvix .so management audit; tracked for a follow-up upstream contribution to python/cpython once the Nanvix port stabilizes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds 39 CPython stdlib extension modules as runtime-loaded
.sofiles under
lib/python3.12/lib-dynload/<name>.cpython-312.soinsteadof statically linking them into
python.elf. Continues the workstarted by
feat/phase0-array-so(the
arrayproof of concept), generalising the*shared*Setup.localflow across the CPython stdlib.Base branch:
feat/phase0-array-so— this PR is filed against
feat/phase0-array-soso it should bemerged after #690 lands.
Why
Before this change, every CPython extension that Nanvix builds was
forced into
*static*inModules/Setup.local, so the entire stdlibextension surface was linked into a monolithic
python.elf.lib-dynload/was empty. Side effects:python.elfcarried every extension's code even when a workloaduses only a fraction of it.
dlopen()at runtime because the CPython build was not actually producing
extension
.sofiles. This blocked any out-of-tree extensionstory.
libraries (
_decimal/libmpdec,pyexpat/libexpat, the SHA-2hash module /
libHacl_Hash_SHA2) paid the bundling cost as partof the monolith.
This PR enables the dlopen flow for the stdlib by making 39 modules
*shared*— matching upstream CPython's default./configurebehavior on Linux.
What changed
39 stdlib modules migrated from
*static*to*shared*:_bisect,_heapq,_struct,_random,_opcode,_queue,_csv,binascii,_json,_pickle,_zoneinfo.python.elf:math,cmath,_statistics,mmap,_contextvars.unicodedata,_multibytecodec,_codecs_cn,_codecs_hk,_codecs_iso2022,_codecs_jp,_codecs_kr,_codecs_tw._asyncio,_datetime,select,_socket,_posixsubprocess,fcntl,termios..sobundles its own.a:_decimal,pyexpat,_elementtree,_md5,_sha1,_sha2,_sha3,_blake2.Architecture
For modules with bundled-in-CPython C deps (
_decimal,pyexpat,_sha2), each.sobundles its own copy of the vendored.a—exactly the behavior of CPython's default
./configurerun on Linuxwithout
--with-system-libmpdec/--with-system-expat:.a_decimal.cpython-312.solibmpdec.aMODULE__DECIMAL_LDFLAGS=$(LIBM) $(LIBMPDEC_A)automatically (see B-1 fix below)pyexpat.cpython-312.solibexpat.aMODULE_PYEXPAT_LDFLAGS=$(LIBM) $(LIBEXPAT_A)automatically (see B-1 fix below)_sha2.cpython-312.solibHacl_Hash_SHA2.a_sha2line inSetup.localreferences the.aexplicitly, matching upstreamModules/Setup.stdlib.in_elementtree.cpython-312.sopyexpatviaPyExpat_CAPIcapsule)_elementtreeimportspyexpatand pulls aPyCapsuleof function pointers, so libexpat lives inpyexpat.soonlympdec,expat, andlibHacl_Hash_SHA2are stateless C APIs withat most one consumer each. Duplicating their code into
python.elfwould buy nothing — none of them maintain process-global state that
needs to be shared across multiple loaded extensions.
--with-libm=is cleared so libm symbols stay inpython.elf—math/cmathresolve them via the existing--whole-archiveoflibm.ainLIBNVX_CRT0at dlopen time.Commits
This PR contains two commits:
1.
[nanvix] E: Build stdlib extensions as .soThe main change. See file table below.
2.
configure: use $(LIBM) instead of literal -lm in LIBMPDEC/LIBEXPAT LDFLAGSSmall follow-up fix to CPython's
configure.ac. The bundled-librarycode paths at
configure.ac:3834and:3918hardcoded a literal-lminLIBEXPAT_LDFLAGSandLIBMPDEC_LDFLAGS. On Linux this isa no-op because
LIBMdefaults to-lm, but on Nanvix we pass--with-libm=empty (so libm.a stays whole-archived intopython.elfonly). The literal-lmdefeated that and bundled a~400 KB copy of
libm.ainto both_decimal.soandpyexpat.so.Switching the literal to
$(LIBM)is equivalent on Linux andcorrectly drops the duplicate libm when
LIBMis empty. Bothconfigure.acand the generatedconfigureare patched in lockstepso no
autoreconfis required.Mechanics
.nanvix/setup_local.py(new)SETUP_LOCAL_ENTRIESdata table +render_setup_local()— single source of truth forModules/Setup.localbody, consumed by both the host (.nanvix/lxml.py) and Docker (.nanvix/docker.py) build paths..nanvix/lxml.pygenerate_setup_local()renders viasetup_local.render_setup_local()..nanvix/docker.py_generate_setup_local_cmd()renders via the same helper and single-quotes each line for aprintf '%s\n' ... > Setup.localinvocation inside the container..nanvix/test.py_SO_MODULE_SANITY_CHECKStable +_render_so_sanity_snippets()generator emit smoke-test snippets that exercise every migrated module viaimport+ a trivial method call, and assert each module is no longer insys.builtin_module_names.Makefile.nanvix--with-libm=(empty) so libm symbols stay inpython.elf. No other configure-time or post-configure changes are needed; CPython's defaultMODULE__DECIMAL_LDFLAGS/MODULE_PYEXPAT_LDFLAGShandle the bundling automatically.configure.ac/configure-lmwith$(LIBM)inLIBEXPAT_LDFLAGSandLIBMPDEC_LDFLAGS.Dependencies
Base branch:
feat/phase0-array-so(the
arrayPhase 0 proof of concept).Runtime dependencies (already merged upstream):
libm visibility fix. Required so
math.so/cmath.socanresolve libm symbols against
python.elf .dynsymat dlopen time.dlfcninit-array + DT_RUNPATH support. Required sodlopen()runs initialisers for the new
.somodules and the loader canfind dependencies.
Validation
./z lint(black + pyright) clean.pre-commit runclean on all changed files (end-of-file-fixer,trailing-whitespace, check-case-conflict, check-merge-conflict).