[nanvix] E: Build stdlib extensions as .so#18
Conversation
6b31bca to
86e1659
Compare
a78b0c8 to
2f602bf
Compare
| { | ||
| static char buf[32]; | ||
| DWORD dw = GetLastError(); | ||
| DWORD dw = GetLastError(); |
There was a problem hiding this comment.
Let's create a separate PR to fix this
| }; | ||
|
|
||
| static struct cffi_tls_s *get_cffi_tls(void); /* in misc_thread_posix.h | ||
| static struct cffi_tls_s *get_cffi_tls(void); /* in misc_thread_posix.h |
There was a problem hiding this comment.
This change also belong to the same cleanup separate PR, as well as the change in line 315 in this file.
| return x; | ||
| } | ||
| /* this hack is for Python 3.5, and also to give a more | ||
| /* this hack is for Python 3.5, and also to give a more |
There was a problem hiding this comment.
Another change to the separate PR for cleaning
|
|
||
| #if PY_MAJOR_VERSION >= 3 | ||
| /* add manually 'module_name' in sys.modules: it seems that | ||
| /* add manually 'module_name' in sys.modules: it seems that |
There was a problem hiding this comment.
Another change for the separate PR for cleaning
| return PyInt_AS_LONG(ob); | ||
| } | ||
| else | ||
| else |
There was a problem hiding this comment.
All changes in this fil should be in the separate PR for cleaning
| # Kept as a single shell statement so it embeds cleanly inside `sh -c | ||
| # '...'` and inside a regular recipe line without backslash-continuation | ||
| # gymnastics. | ||
| NANVIX_VISIBILITY_DEFAULT_PATCH = \ |
There was a problem hiding this comment.
is this the cleanest way to patch it? Is there a more fundamental way to achieve the same?
Builds 39 CPython stdlib extension modules as runtime-loaded .so files under lib/python3.12/lib-dynload/<name>.cpython-312.so instead of statically linking them into python.elf. Generalises the *shared* Setup.local flow first proven by feat/phase0-array-so. Modules built as .so -------------------- - Data primitives (no external deps): _bisect, _heapq, _struct, _random, _opcode, _queue, _csv, binascii, _json, _pickle, _zoneinfo. - Math and memory (libm symbols resolved against python.elf): math, cmath, _statistics, mmap, _contextvars. - Text codecs (no external deps): unicodedata, _multibytecodec, _codecs_cn/hk/iso2022/jp/kr/tw. - With bundled-in-cpython C deps (each .so bundles its own .a, same as upstream cpython's default ./configure run): _asyncio, _datetime, _decimal, pyexpat, _elementtree, _md5, _sha1, _sha2, _sha3, _blake2, select, _socket, _posixsubprocess, fcntl, termios. For the modules with bundled-in-cpython C deps (_decimal, pyexpat, _sha2), each .so bundles its own copy of the vendored .a via cpython's normal MODULE_*_LDFLAGS machinery -- exactly the behavior of cpython's default ./configure run on Linux without --with-system-libmpdec / --with-system-expat. mpdec / expat / libHacl_Hash_SHA2 are stateless C APIs with at most one consumer each (pyexpat exposes a PyCapsule that _elementtree calls through, so libexpat lives in pyexpat.so only), so duplication into python.elf would buy nothing. --with-libm= is cleared so libm symbols stay in python.elf -- math / cmath resolve them via the existing --whole-archive of libm.a in LIBNVX_CRT0. Infrastructure -------------- - .nanvix/setup_local.py (new): SETUP_LOCAL_ENTRIES data table + render_setup_local() -- single source of truth for Modules/Setup.local, consumed by both the host (.nanvix/lxml.py) and Docker (.nanvix/docker.py) build paths. - .nanvix/test.py: _SO_MODULE_SANITY_CHECKS table + _render_so_sanity_snippets() emit smoke-test snippets that exercise every migrated module via import + a trivial method call and assert each module is no longer in sys.builtin_module_names. - Makefile.nanvix: --with-libm= cleared (Phase 1B math/cmath modules resolve libm symbols against python.elf .dynsym at dlopen time via the existing --whole-archive of libm.a in LIBNVX_CRT0). Runtime dependencies (already merged upstream) ---------------------------------------------- - nanvix/nanvix#2472 -- libm visibility fix. - nanvix/nanvix#2473 -- dlfcn init-array + DT_RUNPATH support. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
64c14e9 to
6819a97
Compare
…FLAGS When --with-system-libmpdec / --with-system-expat is *not* given, the bundled-library code paths in configure.ac (lines 3831, 3915) hardcoded a literal `-lm` in: LIBEXPAT_LDFLAGS="-lm $(LIBEXPAT_A)" LIBMPDEC_LDFLAGS="-lm $(LIBMPDEC_A)" On Linux this happens to work because LIBM defaults to `-lm`, so the result is a no-op. On Nanvix we pass `--with-libm=` (empty) to keep libm.a out of every Setup.local `.so` -- libm is whole-archived into python.elf instead. The literal `-lm` defeats that and bundles a full copy of libm.a into both `_decimal.so` and `pyexpat.so` -- about ~400 KB of redundant code per .so, plus a symbol-collision risk that forces --allow-multiple-definition at every other .so link. Switching the literal to `$(LIBM)` is equivalent on Linux (LIBM=-lm by default) and correctly drops the duplicate libm when LIBM is empty. Both `configure.ac` and the generated `configure` are patched in lockstep so no autoreconf is required. Bug surfaced by the cpython-on-Nanvix .so management audit; tracked for a follow-up upstream contribution to python/cpython once the Nanvix port stabilizes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Update: folded in audit finding B-1 as a new commit Without this fix, the stdlib The patch is symmetric on Linux ( Both Stack force-pushed to keep PRs #19 / #20 / #21 atop the new base. |
Summary
Builds 39 CPython stdlib extension modules as runtime-loaded
.sofiles underlib/python3.12/lib-dynload/<name>.cpython-312.soinstead of statically linking them intopython.elf. Continues the work started byfeat/phase0-array-so(thearrayproof of concept), generalising the*shared*Setup.localflow across the cpython stdlib.Base branch:
feat/phase0-array-so— this PR will be filed againstnanvix/cpythonwithnanvix/cpython#690as its base. Local tip2b881a11521matches that PR's HEAD exactly.Why
Before this change, every cpython extension that Nanvix builds was forced into
*static*inModules/Setup.local, so the entire stdlib extension surface was linked into a monolithicpython.elf.lib-dynload/was empty. Side effects:python.elfcarried every extension's code even when a workload uses only a fraction of it.dlopen()at run time because the cpython build was not actually producing extension.sofiles. This blocked any out-of-tree extension story._decimal/libmpdec,pyexpat/libexpat, the SHA2 hash module /libHacl_Hash_SHA2) re-paid the bundling cost per extension.This PR enables the dlopen flow for the stdlib by making 39 modules
*shared*— matching upstream cpython's default./configurebehavior on Linux.What changed
39 stdlib modules are migrated from
*static*(linked intopython.elf) to*shared*(built aslib/python3.12/lib-dynload/<name>.cpython-312.so):_bisect,_heapq,_struct,_random,_opcode,_queue,_csv,binascii,_json,_pickle,_zoneinfo.python.elf:math,cmath,_statistics,mmap,_contextvars.unicodedata,_multibytecodec,_codecs_cn,_codecs_hk,_codecs_iso2022,_codecs_jp,_codecs_kr,_codecs_tw..sobundles its own.a:_asyncio,_datetime,_decimal,pyexpat,_elementtree,_md5,_sha1,_sha2,_sha3,_blake2,select,_socket,_posixsubprocess,fcntl,termios.Architecture
For modules with bundled-in-cpython C deps (
_decimal,pyexpat,_sha2), each.sobundles its own copy of the vendored.a— exactly the behavior of cpython's default./configurerun on Linux without--with-system-libmpdec/--with-system-expat:.a_decimal.cpython-312.solibmpdec.aMODULE__DECIMAL_LDFLAGS=-lm $(LIBMPDEC_A)automaticallypyexpat.cpython-312.solibexpat.aMODULE_PYEXPAT_LDFLAGS=-lm $(LIBEXPAT_A)automatically_sha2.cpython-312.solibHacl_Hash_SHA2.a_sha2line inSetup.localreferences the.aexplicitly, matching upstreamModules/Setup.stdlib.in:81_elementtree.cpython-312.sopyexpatviaPyExpat_CAPIcapsule)_elementtreeimport pyexpatand pulls aPyCapsuleof function pointers, so libexpat lives inpyexpat.soonlympdec,expat, andlibHacl_Hash_SHA2are stateless C APIs with at most one consumer each. Duplicating their code intopython.elfwould buy nothing — none of them maintain process-global state that needs to be shared across multiple loaded extensions.--with-libm=is cleared so libm symbols stay inpython.elf—math/cmathresolve them via the existing--whole-archiveoflibm.ainLIBNVX_CRT0at dlopen time.Mechanics
.nanvix/setup_local.py(new)SETUP_LOCAL_ENTRIESdata table +render_setup_local()— single source of truth forModules/Setup.localbody, consumed by both the host (.nanvix/lxml.py) and Docker (.nanvix/docker.py) build paths..nanvix/lxml.pygenerate_setup_local()renders viasetup_local.render_setup_local()..nanvix/docker.py_generate_setup_local_cmd()renders via the same helper and single-quotes each line for aprintf '%s\n' ... > Setup.localinvocation inside the container..nanvix/test.py_SO_MODULE_SANITY_CHECKStable +_render_so_sanity_snippets()generator emit smoke-test snippets that exercise every migrated module viaimport+ a trivial method call, and assert the module is no longer insys.builtin_module_names.Makefile.nanvix--with-libm=(empty) so libm symbols stay inpython.elf. No other configure-time or post-configure changes are needed; cpython's defaultMODULE__DECIMAL_LDFLAGS/MODULE_PYEXPAT_LDFLAGShandle the bundling automatically.Dependencies
Base branch:
feat/phase0-array-so(thearrayPhase 0 proof of concept).Runtime dependencies (already merged upstream):
math.so/cmath.socan resolve libm symbols againstpython.elf .dynsymat dlopen time.dlfcninit-array + DT_RUNPATH support. Required sodlopen()runs initialisers for the new.somodules and the loader can find dependencies.