Skip to content

[nanvix] E: Remove lxml built-in shim from cpython#749

Open
esaurez wants to merge 1 commit into
feat/wave5-pr-a-stdlib-sofrom
feat/wave5-remove-lxml
Open

[nanvix] E: Remove lxml built-in shim from cpython#749
esaurez wants to merge 1 commit into
feat/wave5-pr-a-stdlib-sofrom
feat/wave5-remove-lxml

Conversation

@esaurez

@esaurez esaurez commented Jun 17, 2026

Copy link
Copy Markdown

Summary

Removes the lxml built-in shim from CPython entirely. lxml is a third-party package, not a CPython stdlib module, and the way it was integrated on Nanvix diverges from how upstream CPython loads third-party extensions. This PR aligns Nanvix with upstream: CPython should contain no lxml at all.

How upstream CPython handles lxml

It doesn't — lxml is third-party. On Linux, pip install lxml builds native CPython extension modules (lxml/etree.cpython-<plat>.so in site-packages, carrying DT_NEEDED for libxml2/libxslt), which CPython's standard ExtensionFileLoader dlopens. Zero CPython modifications, no Setup.local entry, no C trampoline, no Python bridge.

The band-aid being removed

Because makesetup cannot express dotted module names, Nanvix registered the Cython output under the flat name _lxml_etree via Modules/Setup.local, fronted by a C trampoline (lxml_etree_builtin.c) forwarding PyInit, plus a Python bridge (lxml/etree.py) re-exporting it under the dotted lxml.etree name, plus a separate system shared library (liblxml_etree.so). That is two artifacts where Linux has one, and several glue layers that exist only to route a third-party package through the stdlib build machinery.

What this restores

lxml returns the upstream way once the nanvix/lxml port emits native CPython extension modules into site-packages — at which point CPython's standard importer loads lxml.etree directly with zero CPython-side changes. This is tracked as a follow-up change to the nanvix/lxml port repo.

Removed

  • Modules/lxml_etree_builtin.c, Modules/lxml_elementpath_builtin.c (the flat-name PyInit trampolines).
  • .nanvix/setup_local.py: _lxml_etree / _lxml_elementpath entries.
  • .nanvix/lxml.py: deleted. Its generate_setup_local() (general Setup.local generation, not lxml-specific despite the file name) is moved into .nanvix/setup_local.py next to render_setup_local(); the unused clear_setup_local() and the lxml runtime staging (stage_lxml_runtime / _ETREE_SHIM) are dropped.
  • .nanvix/build.py / package.py / test.py: drop the lxml module import, the lxml runtime staging calls, and the standalone lxml.etree import/parse smoke snippet.
  • .nanvix/z.py: drop lxml / libxml2 / libxslt from _DEP_EXPECTED_LIBS and the now-dead python-packages/ payload extraction (only lxml used it; the native-extension form will not).
  • .nanvix/nanvix.toml: drop libxml2 / libxslt / lxml dependency pins.
  • .nanvix/config.py: drop test_nanvix_lxml from the regrtest list.
  • Lib/test/test_nanvix_lxml.py: deleted.

Base / relationship to other PRs

Validation

  • Modules/Setup.local renders cleanly with exactly one *static* and one *shared* marker, no lxml entries.
  • ./z lint (black + pyright) clean.
  • pre-commit run clean on all changed files.

lxml is a third-party package, not a CPython stdlib module, and the
way it was integrated on Nanvix diverges from how upstream CPython
loads third-party extensions. Upstream ships no lxml at all: on Linux
`pip install lxml` builds native CPython extension modules
(lxml/etree.cpython-<plat>.so in site-packages, carrying DT_NEEDED for
libxml2/libxslt) which the standard ExtensionFileLoader dlopens. No
CPython modifications, no Setup.local entry, no C trampoline.

Nanvix instead carried a band-aid: because makesetup cannot express
dotted module names, the Cython output was registered under the flat
name `_lxml_etree` via Modules/Setup.local, fronted by a C trampoline
(lxml_etree_builtin.c) that forwarded PyInit, plus a Python bridge
(lxml/etree.py) re-exporting it under the dotted lxml.etree name, plus
a system shared library (liblxml_etree.so) separate from the extension.
That is two artifacts where Linux has one, and several layers of glue
that exist only to route a third-party package through the stdlib build
machinery.

This removes the entire band-aid from CPython. lxml will return the
upstream way once the nanvix/lxml port emits native CPython extension
modules into site-packages -- at which point CPython's standard
importer loads lxml.etree directly with zero CPython-side changes.
Tracked in nanvix-todo/lxml-port-ship-as-native-cpython-extensions.md.

Removed
-------
- Modules/lxml_etree_builtin.c, Modules/lxml_elementpath_builtin.c
  (the flat-name PyInit trampolines).
- .nanvix/setup_local.py: _lxml_etree / _lxml_elementpath entries.
- .nanvix/lxml.py: deleted. Its generate_setup_local() (general
  Setup.local generation, not lxml-specific despite the file name) is
  moved into .nanvix/setup_local.py next to render_setup_local(); the
  unused clear_setup_local() and the lxml runtime staging
  (stage_lxml_runtime / _ETREE_SHIM) are dropped.
- .nanvix/build.py / package.py / test.py: drop the lxml module import,
  the lxml runtime staging calls, and the standalone lxml.etree
  import/parse smoke snippet.
- .nanvix/z.py: drop lxml / libxml2 / libxslt from _DEP_EXPECTED_LIBS
  and the now-dead python-packages/ payload extraction (only lxml used
  it; the native-extension form will not).
- .nanvix/nanvix.toml: drop libxml2 / libxslt / lxml dependency pins.
- .nanvix/config.py: drop test_nanvix_lxml from the regrtest list.
- Lib/test/test_nanvix_lxml.py: deleted.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant