Skip to content

[nanvix] E: Phase 1C — build 8 Tier-1 text-codec modules as .so#8

Closed
esaurez wants to merge 1 commit into
feat/phase1b-drop-libm-from-math-sofrom
feat/phase1c-tier1-codecs-shared
Closed

[nanvix] E: Phase 1C — build 8 Tier-1 text-codec modules as .so#8
esaurez wants to merge 1 commit into
feat/phase1b-drop-libm-from-math-sofrom
feat/phase1c-tier1-codecs-shared

Conversation

@esaurez

@esaurez esaurez commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Summary

Phase 1C of the .a.so migration (see nanvix-todo/cpython-static-to-shared-migration.md section 5). Promotes the 8 Tier-1 "text codec" stdlib extension modules from statically linked into python.elf to dlopen-loaded shared objects under lib/python3.12/lib-dynload/. Completes Phase 1 (25 modules total across 1A + 1B + 1C).

Modules moved to *shared*

Module Size
unicodedata 1193 KB (Unicode database tables — by far the biggest stdlib extension)
_codecs_jp 262 KB
_codecs_hk 168 KB
_codecs_cn 155 KB
_codecs_kr 145 KB
_multibytecodec 147 KB
_codecs_tw 115 KB
_codecs_iso2022 76 KB

None reference external libraries — pure C with embedded data tables.

Test coverage

New phase1c_snippet in .nanvix/test.py imports each module, asserts non-builtin status, exercises one trivial API (unicodedata.lookup, _multibytecodec.__create_codec, _codecs_<region>.getcodec), and prints __file__. Phase 1A/1B probes retained.

Validation

Tested on phase0-llfix toolchain overlay:

  • All 8 .so files installed under lib-dynload/.
  • nm python.elf no longer shows PyInit_<name> for any of the 8.
  • python.elf size: 19.18 MB (Phase 1B) → 17.48 MB (Phase 1C), −1.70 MB. Biggest single-phase reduction so far thanks to Unicode database + CJK codec tables.
  • Hello + Phase 1A + Phase 1B + Phase 1C import probes + lxml + HTTP smoke + full regrtest 160/160 PASS in standalone mode.

Prerequisites

Stacked on Phase 1B-drop-libm (esaurez/cpython#7). No additional newlib / gcc / nanvix PRs needed.

Risk

Mechanical configuration change — 8 entries added to *shared* block of Setup.local. Same pattern as Phase 1A/1B.

Phase 1C of the .a -> .so migration (see
nanvix-todo/cpython-static-to-shared-migration.md section 5).
Builds on Phase 1B (#6, #7) by promoting the remaining 8 Tier-1
"text codec" stdlib extension modules from statically linked into
python.elf to dlopen-loaded shared objects under
lib/python3.12/lib-dynload/.

Modules moved to *shared* in Modules/Setup.local generation
(.nanvix/docker.py):

- unicodedata: Unicode database lookups (the big one — 1.2 MB of
  unicode data tables).
- _multibytecodec: shared CJK codec infrastructure.
- _codecs_cn / _codecs_hk / _codecs_iso2022 / _codecs_jp /
  _codecs_kr / _codecs_tw: per-region CJK codec tables.

None of the eight reference external libraries; they are pure C
with embedded data tables. They link against the same -lc /
runtime symbols that the rest of the Phase 1 modules use.

Test coverage (.nanvix/test.py):

- New phase1c_snippet imports each module, asserts it is NOT in
  sys.builtin_module_names, exercises one trivial API call to
  confirm dlopen + PyInit_<name> succeeded (unicodedata.lookup,
  _multibytecodec.__create_codec, _codecs_<region>.getcodec), and
  prints the resolved __file__ path. Phase 1A/1B probes retained.

Validation on local toolchain (phase0-llfix):

- All 8 new .so files produced and installed under lib-dynload/
  (unicodedata 1193K, _codecs_jp 262K, _codecs_hk 168K,
  _codecs_cn 155K, _codecs_kr 145K, _multibytecodec 147K,
  _codecs_tw 115K, _codecs_iso2022 76K — total ~2.2 MB across
  the eight files).
- nm python.elf no longer shows PyInit_<name> for any of the 8.
- python.elf size: 19.18 MB (Phase 1B) -> 17.48 MB (Phase 1C),
  -1.70 MB. Biggest single-phase reduction so far because the
  CJK codec tables and the Unicode database are large.
- Hello + Phase 1A + Phase 1B + Phase 1C import probes + lxml +
  HTTP smoke + full regrtest 160/160 PASS in standalone mode.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant