build: optimize startup performance for microvm/standalone mode#365
build: optimize startup performance for microvm/standalone mode#365ppenna wants to merge 1 commit into
Conversation
Combine multiple performance optimizations to reduce cold-boot hello-world execution time from ~600ms to ~450ms, with VM snapshot support enabling sub-200ms restore-boot. Changes from PR #361 - Non-PIE static linking (~20ms saved): - Switch LDFLAGS from -Wl,-pie -Wl,--export-dynamic to -no-pie - Eliminates 100K R_386_RELATIVE relocations at startup Changes from PR #362 - Compiler flag optimizations (~70ms saved): - Add -O3 -fomit-frame-pointer -fno-unwind-tables -fno-asynchronous-unwind-tables to CFLAGS - Remove --with-lto (causes I-cache pressure on constrained microvm) - Add --without-doc-strings and --with-computed-gotos Changes from PR #363 - Test harness optimizations (~40ms saved): - Use --strip-all instead of --strip-debug for smallest binary - Move host binaries (nanvixd, kernel, mkramfs) out of ramfs tree - Build minimal ramfs with only test script and encodings/ - Add -S flag (skip site.py) and PYTHONHASHSEED=0 - Remove timeout wrapper to reduce measurement overhead Changes from PR #318 - VM snapshot support: - Add Modules/nanvix_snapshot.S: assembly syscall helper (int 0x80, nr 35) - Call nanvix_snapshot() after Py_Initialize when NANVIX_SNAPSHOT=1 is set - Enables restore-boot in ~35ms (requires snapshot-capable nanvixd) - Gated by NANVIX_SNAPSHOT=1 env var for backward compatibility
There was a problem hiding this comment.
Pull request overview
This PR consolidates multiple Nanvix microvm/standalone startup optimizations and adds optional VM snapshot triggering to significantly reduce cold-boot and restore-boot times.
Changes:
- Adjust Nanvix build flags for faster startup (non-PIE static linking,
-O3, omit frame pointer, disable unwind tables, remove LTO, drop docstrings, enable computed gotos). - Optimize the standalone hello-world test harness to reduce ramfs size and interpreter startup overhead (minimal ramfs content,
-S,PYTHONHASHSEED=0,--strip-all, host-binary relocation). - Add Nanvix VM snapshot trigger support (assembly syscall helper + env-gated call after initialization).
Show a summary per file
| File | Description |
|---|---|
| Modules/nanvix_snapshot.S | Adds an i686 int 0x80 syscall helper to trigger a Nanvix VM snapshot. |
| Modules/main.c | Calls nanvix_snapshot() after initialization when NANVIX_SNAPSHOT=1 on Nanvix. |
| Makefile.nanvix | Updates compile/link/configure flags, injects snapshot object into build, and streamlines the standalone test ramfs + invocation. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 1
| ./bin/nanvixd.elf \ | ||
| -bin-dir ./bin -ramfs /tmp/rootfs.img \ | ||
| -- ./bin/python3.12 \ | ||
| "-B /test_hello.py;PYTHONHOME=/ PYTHONDONTWRITEBYTECODE=1" \ | ||
| "-B -S /test_hello.py;PYTHONHOME=/ PYTHONDONTWRITEBYTECODE=1 PYTHONHASHSEED=0" \ | ||
| < /dev/null > /tmp/cpython_test.log 2>&1; \ |
There was a problem hiding this comment.
Removing the timeout 120 wrapper means the standalone hello test can now hang indefinitely if nanvixd.elf or the guest stalls. This is a reliability regression for make test/CI runs; consider keeping a timeout (even if implemented differently) or updating nanvixd invocation to enforce a max runtime. Also, the later failure message still mentions "timed out" even though the wrapper was removed, which can mislead debugging.
Summary
Combine performance optimizations from PRs #361, #362, #363, and #318 to reduce
cold-boot hello-world execution time from ~600ms to ~450ms, with VM snapshot
support enabling sub-200ms restore-boot.
Changes
1. Non-PIE static linking (PR #361, ~20ms saved)
-Wl,-pie -Wl,--export-dynamicto-no-pieR_386_RELATIVErelocations processed at startup2. Compiler flag optimizations (PR #362, ~70ms saved)
-O3: Aggressive optimization (~60ms faster than-Oson microvm)-fomit-frame-pointer: Frees EBP register on i686 (~5ms)-fno-unwind-tables -fno-asynchronous-unwind-tables: Shrinks binary--with-lto: LTO causes I-cache pressure on constrained microvm--without-doc-strings: Reduces .rodata size--with-computed-gotos: Faster bytecode dispatch for cross builds3. Test harness optimizations (PR #363, ~40ms saved)
--strip-allinstead of--strip-debugfor smallest binary footprint-Sflag: Skipsite.pyimport (~10ms)PYTHONHASHSEED=0: Fixed hash seed, avoids entropy overheadtimeoutwrapper (avoids extra process, ~5ms)4. VM snapshot support (PR #318)
Modules/nanvix_snapshot.S: Assembly syscall helper (int $0x80, nr 35)nanvix_snapshot()after initialization whenNANVIX_SNAPSHOT=1is setPerformance
Note
The 200ms target is achievable via VM snapshot restore (
NANVIX_SNAPSHOT=1),which requires a snapshot-capable nanvixd. Cold-boot optimizations alone bring
startup from ~600ms to ~450ms.