[nanvix] E: Phase 4 — build lxml C extensions as .so (unbundled)#11
Open
esaurez wants to merge 1 commit into
Open
[nanvix] E: Phase 4 — build lxml C extensions as .so (unbundled)#11esaurez wants to merge 1 commit into
esaurez wants to merge 1 commit into
Conversation
1ccd71d to
3c60ab2
Compare
7c7e39f to
453b1e3
Compare
3c60ab2 to
81de428
Compare
19e231b to
399a6d6
Compare
This was referenced Jun 4, 2026
esaurez
pushed a commit
to esaurez/lxml
that referenced
this pull request
Jun 4, 2026
Produce position-independent liblxml_etree.so and liblxml_elementpath.so alongside the existing static archives, wired as a real DT_NEEDED chain on top of esaurez/libxml2 + esaurez/libxslt: liblxml_etree.so -> NEEDED libxslt.so, libexslt.so, libxml2.so liblxml_elementpath.so -> (pure-Cython, no native deps) Only the cython-generated lxml.etree.c is embedded in liblxml_etree.so; libxslt, libxml2, and libz live in their own .so files and are pulled in transitively by the Nanvix dynamic loader at dlopen time. This exercises the DT_NEEDED chain support shipped in esaurez/nanvix#27 in a real-world setting and eliminates the multi-megabyte per-module duplication that a self-contained build would cause. Concretely: * `-fPIC` is added to the per-source compile commands, so the same .o files are usable for both .a and .so. * Two new SHAREDLIB targets link via `-shared -fPIC -nostdlib -Wl,--whole-archive <own>.a -Wl,--no-whole-archive [-lxslt -lexslt -lxml2]`, setting DT_SONAME=liblxml_etree.so / DT_SONAME=liblxml_elementpath.so. * `.nanvix/z.py` `output_files` and the Makefile's `package` / `verify-package` targets ship both the static and shared variants. Sizes (stripped, DT_NEEDED chain vs the discarded self-contained prototype): liblxml_etree.so 1.7 MB (was 3.5 MB) liblxml_elementpath.so 157 KB (was 153 KB; pure-Cython, no deps) Runtime dependencies: * esaurez/nanvix#27 — `.init_array` invocation + DT_NEEDED chain walking in the user-space loader. * esaurez/libxml2#1 + esaurez/libxslt#1 — libxml2.so, libxslt.so, and libexslt.so must be present in the buildroot. This implies a sequenced rollout: merge libxml2#1 -> release -> bump libxslt's pin -> merge libxslt#1 -> release -> bump this PR's pins -> merge this PR. End-to-end validation (DT_NEEDED chain resolved by the Nanvix loader: liblxml_etree.so -> libxslt.so -> libxml2.so) will land in a follow-up against esaurez/cpython#11. CPython's Phase 4 will switch from the MODLIBS-piggyback workaround to a clean dlopen of liblxml_etree.so, letting python.elf shrink by ~3 MB. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 4 of the .a -> .so migration (see nanvix-todo/cpython-static-to-shared-migration.md section 8). Promotes the 2 lxml C extension modules (_lxml_etree, _lxml_elementpath) from statically linked into python.elf to dlopen-loaded shared objects, and unbundles their underlying C archives (liblxml_etree, liblxml_elementpath, libxslt, libexslt, libxml2) into python.elf so each .so stays a thin shim. Modules/Setup.local changes (.nanvix/docker.py): - _lxml_etree and _lxml_elementpath move from *static* to *shared* with no per-module -L/-l flags (no bundling). - The lxml C archives are attached to the _nanvix static-module line via `-Wl,--whole-archive liblxml_etree.a liblxml_elementpath.a libxslt.a libexslt.a libxml2.a -Wl,--no-whole-archive`. CPython's Setup processor places these in MODLIBS, so they flow into the python.elf link command but not into autoconf conftest links (which lack libpython3.12.a and therefore cannot resolve the Python C API symbols those archives reference). - python.elf re-exports the lxml symbols via --export-dynamic, and the .so shims resolve PyInit_etree / PyInit_elementpath via the main exe's .dynsym at dlopen time. Both shim files (lxml_etree_builtin.c, lxml_elementpath_builtin.c) are unchanged. The lxml/etree.py Python-level re-export shim continues to work because Python's import system finds the .so files in lib/python3.12/lib-dynload/ at import time. Size impact (vs Phase 3 baseline, stripped): - _lxml_etree.cpython-312.so: ~3 MB -> 8.3 KB (-3 MB) - _lxml_elementpath.cpython-312.so: ~3 MB -> 8.3 KB (-3 MB) - python.elf: 8.48 MB -> 12.26 MB (+3.78 MB) Net: small overall growth (the shared lxml code now lives in python.elf, but each .so is now ~8 KB instead of ~3 MB, matching the canonical upstream CPython model). Validation: - z build PASS (configure conftest unaffected by the MODLIBS trick — verified by inspecting config.log) - z test PASS (lxml smoke + 160/160 regrtest modules) - lxml import via dlopen confirmed by phase test probe output Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
399a6d6 to
1a14f4d
Compare
This was referenced Jun 5, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 4 of the
.a→.somigration (roadmap). Switches thelxmlextension chain from being statically bundled intopython.elfto being loaded at runtime viadlopenofliblxml_etree.so, with a properDT_NEEDEDchain pulling inlibxslt.so→libexslt.so→libxml2.sotransitively.Architecture
The shim modules (
Modules/lxml_etree_builtin.c,Modules/lxml_elementpath_builtin.c) are tiny C wrappers that:dlopen(RTLD_NOW | RTLD_GLOBAL)the underlying.sofrom/lib/python3.12/lib-dynload/.dlsymPyInit_etree/PyInit__elementpath.dlcloseonly on the failure path.The Setup.local entries (
*shared*block) consume only the shim.cfiles — no-L/-lflags, no--whole-archive, no MODLIBS piggyback. All resolution happens at dlopen time againstpython.elf's.dynsym(which already exports the full POSIX/libc/libm runtime via PR #1).Size impact
python.elf_lxml_etree.cpython-312.soliblxml_etree.so~2.3 MB reclaimed from the binary; the lxml code now ships once on disk rather than being duplicated per-process.
What changed
Modules/lxml_etree_builtin.c--whole-archive-embedded module). Leaks handle on success,dlcloseon dlsym failure.Modules/lxml_elementpath_builtin.c_elementpath..nanvix/lxml.py_SETUP_LOCAL_TEMPLATErewritten to bare_lxml_etree lxml_etree_builtin.c(no link flags).generate_setup_localkeeps its signature for source-compat withbuild.py..nanvix/docker.py_generate_setup_local_cmdmirrorslxml.py—_nanvixline no longer carries MODLIBS-piggyback flags..nanvix/package.pyliblxml_etree.so,liblxml_elementpath.so,libxslt.so,libexslt.so,libxml2.sointo the release tarball'slib/python3.12/lib-dynload/. Hard-fails withFileNotFoundErrorif any are missing..nanvix/test.pyValidation
End-to-end test on
nanvix-dev:The 3-deep
DT_NEEDEDchain plus the diamond atlibexslt.so→{libxslt, libxml2}∪liblxml_etree.so→{libxslt, libexslt, libxml2}resolves cleanly thanks to the loader fixes in esaurez/nanvix#27 and esaurez/nanvix#28.Independent regrtest run: 122 / 185 PASS, 0 regression vs the pre-rework baseline. Remaining failures (44 ERROR + 19 TIMEOUT) are all pre-existing Nanvix issues independent of this PR (see
nanvix-todo/cpython-libregrtest-compile-time-import.mdand the unittest-teardown hang).Code review
A code-review pass found three issues — all fixed in the commit pushed here:
.nanvix/lxml.py's template still carried the old MODLIBS-piggyback-L/-lflags, which would silently break local (non-docker) builds.dlopenhandle was leaked on thedlsym-failure path; nowdlclose'd.test.py/package.pypreviously printedWARNINGand continued when required.sofiles were missing, producing silently broken artifacts. Now they raiseFileNotFoundError.Runtime dependencies (must ship together)
[syscall] E: Run dlopen ctors/dtors and DT_RUNPATH. Required for the lxml chain to initialize (Cython modules rely on.init_array).[syscall] B: Fix diamond DT_NEEDED handling. The lxml chain forms a diamond atlibexsltandliblxml_etree; without this fixdlopendeadlocks or double-loads.libxml2.so(bottom of the chain).libxslt.so+libexslt.sowithDT_NEEDED libxml2.so.liblxml_etree.so+liblxml_elementpath.sowith fullDT_NEEDEDchain.Build-time dependencies (in this repo)
python.elfand re-exports via--export-dynamic. The dlopen'd.sochain leaves libc/libm/libposix/libstdc++ symbols undefined and binds them againstpython.elfat load time.libnvx_crt0.astartup (required for any cpython build with the current toolchain).ninjaandCythoninto the toolchain image (required by the lxml port-lib builds).Sequenced rollout
Until step 5, this repo's CI can still produce a working
python.elfagainst the existing static-only release tarballs (the shim modules degrade to dlopen-fails-at-import, which is a self-contained failure rather than a build break).