Skip to content

[nanvix] E: Build stdlib extensions as .so#732

Open
esaurez wants to merge 2 commits into
feat/phase0-array-sofrom
feat/wave5-pr-a-stdlib-so
Open

[nanvix] E: Build stdlib extensions as .so#732
esaurez wants to merge 2 commits into
feat/phase0-array-sofrom
feat/wave5-pr-a-stdlib-so

Conversation

@esaurez

@esaurez esaurez commented Jun 15, 2026

Copy link
Copy Markdown

Summary

Builds 39 CPython stdlib extension modules as runtime-loaded .so
files under lib/python3.12/lib-dynload/<name>.cpython-312.so instead
of statically linking them into python.elf. Continues the work
started by feat/phase0-array-so
(the array proof of concept), generalising the *shared*
Setup.local flow across the CPython stdlib.

Base branch: feat/phase0-array-so
— this PR is filed against feat/phase0-array-so so it should be
merged after #690 lands.

Why

Before this change, every CPython extension that Nanvix builds was
forced into *static* in Modules/Setup.local, so the entire stdlib
extension surface was linked into a monolithic python.elf.
lib-dynload/ was empty. Side effects:

  • python.elf carried every extension's code even when a workload
    uses only a fraction of it.
  • Third-party C extensions could not be loaded via dlopen() at run
    time because the CPython build was not actually producing
    extension .so files. This blocked any out-of-tree extension
    story.
  • Stdlib extensions that bundle their own copies of vendored C
    libraries (_decimal/libmpdec, pyexpat/libexpat, the SHA-2
    hash module / libHacl_Hash_SHA2) paid the bundling cost as part
    of the monolith.

This PR enables the dlopen flow for the stdlib by making 39 modules
*shared* — matching upstream CPython's default ./configure
behavior on Linux.

What changed

39 stdlib modules migrated from *static* to *shared*:

  • Data primitives, no external deps: _bisect, _heapq,
    _struct, _random, _opcode, _queue, _csv, binascii,
    _json, _pickle, _zoneinfo.
  • Math and memory, libm symbols resolved against python.elf:
    math, cmath, _statistics, mmap, _contextvars.
  • Text codecs, no external deps: unicodedata,
    _multibytecodec, _codecs_cn, _codecs_hk, _codecs_iso2022,
    _codecs_jp, _codecs_kr, _codecs_tw.
  • POSIX wrappers + concurrency / datetime, no external deps:
    _asyncio, _datetime, select, _socket, _posixsubprocess,
    fcntl, termios.
  • With bundled-in-CPython C deps, each .so bundles its own
    .a:
    _decimal, pyexpat, _elementtree, _md5, _sha1,
    _sha2, _sha3, _blake2.

Architecture

For modules with bundled-in-CPython C deps (_decimal, pyexpat,
_sha2), each .so bundles its own copy of the vendored .a
exactly the behavior of CPython's default ./configure run on Linux
without --with-system-libmpdec / --with-system-expat:

Module Bundled .a How it gets in
_decimal.cpython-312.so libmpdec.a CPython's configure sets MODULE__DECIMAL_LDFLAGS=$(LIBM) $(LIBMPDEC_A) automatically (see B-1 fix below)
pyexpat.cpython-312.so libexpat.a CPython's configure sets MODULE_PYEXPAT_LDFLAGS=$(LIBM) $(LIBEXPAT_A) automatically (see B-1 fix below)
_sha2.cpython-312.so libHacl_Hash_SHA2.a the _sha2 line in Setup.local references the .a explicitly, matching upstream Modules/Setup.stdlib.in
_elementtree.cpython-312.so (none — calls into pyexpat via PyExpat_CAPI capsule) upstream pattern: _elementtree imports pyexpat and pulls a PyCapsule of function pointers, so libexpat lives in pyexpat.so only

mpdec, expat, and libHacl_Hash_SHA2 are stateless C APIs with
at most one consumer each. Duplicating their code into python.elf
would buy nothing — none of them maintain process-global state that
needs to be shared across multiple loaded extensions.

--with-libm= is cleared so libm symbols stay in python.elf
math / cmath resolve them via the existing --whole-archive of
libm.a in LIBNVX_CRT0 at dlopen time.

Modules without external deps:
  <module>.cpython-312.so
      └── UND symbols → resolved against python.elf .dynsym at dlopen
                         (--export-dynamic on python.elf LDFLAGS)

Modules with bundled-in-CPython C deps:
  _decimal.cpython-312.so       (bundles libmpdec.a; no external deps)
  pyexpat.cpython-312.so        (bundles libexpat.a;  no external deps)
  _sha2.cpython-312.so          (bundles libHacl_Hash_SHA2.a)
  _elementtree.cpython-312.so   (uses pyexpat's PyExpat_CAPI capsule)

Commits

This PR contains two commits:

1. [nanvix] E: Build stdlib extensions as .so

The main change. See file table below.

2. configure: use $(LIBM) instead of literal -lm in LIBMPDEC/LIBEXPAT LDFLAGS

Small follow-up fix to CPython's configure.ac. The bundled-library
code paths at configure.ac:3834 and :3918 hardcoded a literal
-lm in LIBEXPAT_LDFLAGS and LIBMPDEC_LDFLAGS. On Linux this is
a no-op because LIBM defaults to -lm, but on Nanvix we pass
--with-libm= empty (so libm.a stays whole-archived into
python.elf only). The literal -lm defeated that and bundled a
~400 KB copy of libm.a into both _decimal.so and pyexpat.so.

Switching the literal to $(LIBM) is equivalent on Linux and
correctly drops the duplicate libm when LIBM is empty. Both
configure.ac and the generated configure are patched in lockstep
so no autoreconf is required.

Mechanics

File Change
.nanvix/setup_local.py (new) SETUP_LOCAL_ENTRIES data table + render_setup_local() — single source of truth for Modules/Setup.local body, consumed by both the host (.nanvix/lxml.py) and Docker (.nanvix/docker.py) build paths.
.nanvix/lxml.py generate_setup_local() renders via setup_local.render_setup_local().
.nanvix/docker.py _generate_setup_local_cmd() renders via the same helper and single-quotes each line for a printf '%s\n' ... > Setup.local invocation inside the container.
.nanvix/test.py _SO_MODULE_SANITY_CHECKS table + _render_so_sanity_snippets() generator emit smoke-test snippets that exercise every migrated module via import + a trivial method call, and assert each module is no longer in sys.builtin_module_names.
Makefile.nanvix Sets --with-libm= (empty) so libm symbols stay in python.elf. No other configure-time or post-configure changes are needed; CPython's default MODULE__DECIMAL_LDFLAGS / MODULE_PYEXPAT_LDFLAGS handle the bundling automatically.
configure.ac / configure B-1 fix: replace literal -lm with $(LIBM) in LIBEXPAT_LDFLAGS and LIBMPDEC_LDFLAGS.

Dependencies

Base branch: feat/phase0-array-so
(the array Phase 0 proof of concept).

Runtime dependencies (already merged upstream):

  • nanvix/nanvix#2472
    libm visibility fix. Required so math.so / cmath.so can
    resolve libm symbols against python.elf .dynsym at dlopen time.
  • nanvix/nanvix#2473
    dlfcn init-array + DT_RUNPATH support. Required so dlopen()
    runs initialisers for the new .so modules and the loader can
    find dependencies.

Validation

  • ./z lint (black + pyright) clean.
  • pre-commit run clean on all changed files (end-of-file-fixer,
    trailing-whitespace, check-case-conflict, check-merge-conflict).
  • Full Nanvix CI runs via the workflow.

Enrique Saurez and others added 2 commits June 12, 2026 16:42
Builds 39 CPython stdlib extension modules as runtime-loaded .so files
under lib/python3.12/lib-dynload/<name>.cpython-312.so instead of
statically linking them into python.elf. Generalises the *shared*
Setup.local flow first proven by feat/phase0-array-so.

Modules built as .so
--------------------
- Data primitives (no external deps): _bisect, _heapq, _struct,
  _random, _opcode, _queue, _csv, binascii, _json, _pickle, _zoneinfo.
- Math and memory (libm symbols resolved against python.elf): math,
  cmath, _statistics, mmap, _contextvars.
- Text codecs (no external deps): unicodedata, _multibytecodec,
  _codecs_cn/hk/iso2022/jp/kr/tw.
- With bundled-in-cpython C deps (each .so bundles its own .a, same
  as upstream cpython's default ./configure run): _asyncio,
  _datetime, _decimal, pyexpat, _elementtree, _md5, _sha1, _sha2,
  _sha3, _blake2, select, _socket, _posixsubprocess, fcntl, termios.

For the modules with bundled-in-cpython C deps (_decimal, pyexpat,
_sha2), each .so bundles its own copy of the vendored .a via
cpython's normal MODULE_*_LDFLAGS machinery -- exactly the behavior
of cpython's default ./configure run on Linux without
--with-system-libmpdec / --with-system-expat. mpdec / expat /
libHacl_Hash_SHA2 are stateless C APIs with at most one consumer
each (pyexpat exposes a PyCapsule that _elementtree calls through,
so libexpat lives in pyexpat.so only), so duplication into
python.elf would buy nothing.

--with-libm= is cleared so libm symbols stay in python.elf -- math /
cmath resolve them via the existing --whole-archive of libm.a in
LIBNVX_CRT0.

Infrastructure
--------------
- .nanvix/setup_local.py (new): SETUP_LOCAL_ENTRIES data table +
  render_setup_local() -- single source of truth for
  Modules/Setup.local, consumed by both the host (.nanvix/lxml.py)
  and Docker (.nanvix/docker.py) build paths.

- .nanvix/test.py: _SO_MODULE_SANITY_CHECKS table +
  _render_so_sanity_snippets() emit smoke-test snippets that
  exercise every migrated module via import + a trivial method call
  and assert each module is no longer in sys.builtin_module_names.

- Makefile.nanvix: --with-libm= cleared (Phase 1B math/cmath modules
  resolve libm symbols against python.elf .dynsym at dlopen time
  via the existing --whole-archive of libm.a in LIBNVX_CRT0).

Runtime dependencies (already merged upstream)
----------------------------------------------
- nanvix/nanvix#2472 -- libm visibility fix.
- nanvix/nanvix#2473 -- dlfcn init-array + DT_RUNPATH support.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…FLAGS

When --with-system-libmpdec / --with-system-expat is *not* given,
the bundled-library code paths in configure.ac (lines 3831, 3915)
hardcoded a literal `-lm` in:

  LIBEXPAT_LDFLAGS="-lm $(LIBEXPAT_A)"
  LIBMPDEC_LDFLAGS="-lm $(LIBMPDEC_A)"

On Linux this happens to work because LIBM defaults to `-lm`, so the
result is a no-op. On Nanvix we pass `--with-libm=` (empty) to keep
libm.a out of every Setup.local `.so` -- libm is whole-archived into
python.elf instead. The literal `-lm` defeats that and bundles a full
copy of libm.a into both `_decimal.so` and `pyexpat.so` -- about
~400 KB of redundant code per .so, plus a symbol-collision risk that
forces --allow-multiple-definition at every other .so link.

Switching the literal to `$(LIBM)` is equivalent on Linux (LIBM=-lm
by default) and correctly drops the duplicate libm when LIBM is
empty.

Both `configure.ac` and the generated `configure` are patched in
lockstep so no autoreconf is required.

Bug surfaced by the cpython-on-Nanvix .so management audit; tracked
for a follow-up upstream contribution to python/cpython once the
Nanvix port stabilizes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants