Skip to content

Live bootstrap to guix#578

Draft
vxtls wants to merge 270 commits intofosslinux:masterfrom
vxtls:live-bootstrap-to-guix
Draft

Live bootstrap to guix#578
vxtls wants to merge 270 commits intofosslinux:masterfrom
vxtls:live-bootstrap-to-guix

Conversation

@vxtls
Copy link
Copy Markdown

@vxtls vxtls commented Mar 15, 2026

No description provided.

vxtls added 30 commits February 22, 2026 13:00
…hs, and disable unused-but-set-variable as error
feat(steps-guix): add libgcrypt-1.12.1 default build with gcc-detected host and pkg-config path
feat(steps-guix): add guile-gcrypt-0.5.0 with dynamic libgcrypt prefix and ld library path
@vxtls
Copy link
Copy Markdown
Author

vxtls commented Apr 9, 2026

But now, I'm facing a new problem, where one of the "module-import-compiled" packages fails to build, apparently because contents of the host Guile (3.0.11) leak into the Guix environment, where the local Guile (3.0.9) can't execute them. I'm still working on solving this.

What's the status of this issue? One thing that puzzles me is, aren't all Guix builds supposed to run in a chroot or namespace environment?
EDIT: can i have the log?
EDIT2: Currently I am running ISO build, see if I can reproduce the problem.

@vxtls
Copy link
Copy Markdown
Author

vxtls commented Apr 13, 2026

Copying to the store directly didn't work. Modifying the package definition to use the alternative URL did.

But now, I'm facing a new problem, where one of the "module-import-compiled" packages fails to build, apparently because contents of the host Guile (3.0.11) leak into the Guix environment, where the local Guile (3.0.9) can't execute them. I'm still working on solving this.

  GEN      etc/init.d/guix-daemon
  GEN      etc/guix-daemon.conf
  GEN      etc/guix-publish.conf
make[2]: Leaving directory '/tmp/guix-build-guix-1.5.0rc1.drv-0/source'
make[1]: Leaving directory '/tmp/guix-build-guix-1.5.0rc1.drv-0/source'
phase `build' succeeded after 4432.3 seconds
starting phase `copy-bootstrap-guile'
accepted connection from pid 8350, user nixbld
accepted connection from pid 8378, user nixbld
Backtrace:
In ice-9/boot-9.scm:
  1752:10 10 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
           9 (apply-smob/0 #<thunk 7ffff77352a0>)
In ice-9/boot-9.scm:
    724:2  8 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
In ice-9/eval.scm:
    619:8  7 (_ #(#(#<directory (guile-user) 7ffff773ac80>)))
In ice-9/command-line.scm:
   185:19  6 (_ #<input: string 7ffff7734850>)
In unknown file:
           5 (eval (begin (use-modules (guix)) (with-store store …)) #)
In ice-9/boot-9.scm:
  1752:10  4 (with-exception-handler _ _ #:unwind? _ # _)
In guix/store.scm:
   690:37  3 (thunk)
In ice-9/eval.scm:
    619:8  2 (_ #(#(#(#(#<directory (guile-user) 7ffff773a…>) …) …) …))
In unknown file:
           1 (symlink "/tmp/guix-tests/store/r26xz0l0wb2x0kil3c539f…" …)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
In procedure symlink: File exists
error: in phase 'copy-bootstrap-guile': uncaught exception:
%exception #<&invoke-error program: "./test-env" arguments: ("guile" "-c" "(begin (use-modules (guix)) (with-store store (let* ((item (add-to-store store \"guile-static-stripped-2.0.9-i686-linux.tar.xz\" #f
phase `copy-bootstrap-guile' failed after 0.6 seconds
command "./test-env" "guile" "-c" "(begin (use-modules (guix)) (with-store store (let* ((item (add-to-store store \"guile-static-stripped-2.0.9-i686-linux.tar.xz\" #f \"sha256\" \"/gnu/store/29dkam4a6sdz1pn81
build process 12 exited with status 256
bash-5.3# ls  /tmp/guix-tests
ls: cannot access '/tmp/guix-tests': No such file or directory

is it something like this?

@Googulator
Copy link
Copy Markdown
Collaborator

Googulator commented Apr 13, 2026

No, it was a different issue. Sourcing the Guix profile (/var/guix/profiles/per-user/root/current-guix/etc/profile) solved that, but it causes a new, strange issue where the ISO build will fail instantly when called by the main build script, or manually in the primary TTY1 terminal. Switching to TTY2 and calling it there makes it work.

I did encounter the issue you describe - I solved it differently, by making the 2nd copy operation conditional on the i686 and x86_64 bootstrap guile paths being different.

Now, I'm facing another issue, where tests fail for the "guix-1.5.0rc1.drv" derivation. The cause seems to be that it's trying to download the official bootstrap binaries to run the tests with, but is failing to do so. IMO the right way to fix this is to inject the patches we use when building our Guix into this derivation, so it too uses the locally built bootstrap binaries instead.

@vxtls
Copy link
Copy Markdown
Author

vxtls commented Apr 13, 2026

I did encounter the issue you describe - I solved it differently, by making the 2nd copy operation conditional on the i686 and x86_64 bootstrap guile paths being different.

Where I am doing is to copy the original Bootstrap Guile directory, then add i686.txt and x86_64.txt to it, ensuring that their hash values are different. This way, guix won't fail during the copy operation because a file already exists at the destination (currently being tested; it just failed because the patch wasn't written correctly).

Now, I'm facing another issue, where tests fail for the "guix-1.5.0rc1.drv" derivation. The cause seems to be that it's trying to download the official bootstrap binaries to run the tests with, but is failing to do so. IMO the right way to fix this is to inject the patches we use when building our Guix into this derivation, so it too uses the locally built bootstrap binaries instead.

Where is the definition file for this?

And another question: Do you think we should make the generated ISO to be installed offline? (Because the default ISO image downloads external substitutes during the installation process, which undermines our assumption of a fully source-code-based build.)

@Googulator
Copy link
Copy Markdown
Collaborator

The definition file is gnu/packages/package-management.scm - I've attached the patch I used for the double-bootstrap-guile bug. The issue where tests fail due to trying to access the official binaries is also in the same package definition.

IMO we should keep the ISO's ability to use substitutes, but have it default to using the bootstrap machine as its sole substitute source. The packages there are properly bootstrapped, and therefore fine to use as substitutes.

@Googulator
Copy link
Copy Markdown
Collaborator

Googulator commented Apr 19, 2026

Currently running a bare metal test with this diff on top of the latest commit:

$ git diff
diff --git a/steps-guix/improve/guix-daemon-and-pull.sh b/steps-guix/improve/guix-daemon-and-pull.sh
index c7625474..f61b5d9f 100644
--- a/steps-guix/improve/guix-daemon-and-pull.sh
+++ b/steps-guix/improve/guix-daemon-and-pull.sh
@@ -218,6 +218,9 @@ mkdir -p /proc /sys /dev "${guix_localstate_dir}/daemon-socket" /var/lib/guix /r
 mount | grep ' on /proc ' >/dev/null 2>&1 || mount -t proc proc /proc
 mount | grep ' on /sys ' >/dev/null 2>&1 || mount -t sysfs sysfs /sys
 mount | grep ' on /dev ' >/dev/null 2>&1 || mount -t devtmpfs devtmpfs /dev
+# tmpfs must be unmounted to avoid overfilling memory
+mount | grep ' on /tmp ' >/dev/null 2>&1 && umount /tmp
+test -f /swapfile && swapon /swapfile
 if ! mount | grep ' on /dev/pts ' >/dev/null 2>&1; then
     mkdir -p /dev/pts
     mount -t devpts devpts /dev/pts
diff --git a/steps-guix/jump/linux64.sh b/steps-guix/jump/linux64.sh
index 1bb6ba93..f942c2a1 100644
--- a/steps-guix/jump/linux64.sh
+++ b/steps-guix/jump/linux64.sh
@@ -37,4 +37,8 @@ else
         --append="console=ttyS0 root=/dev/sda1 init=/init rw rootwait consoleblank=0"
 fi
 quiesce_filesystem_for_kexec
-kexec -e
+if [ "${BARE_METAL}" = True ]; then
+    echo b > /proc/sysrq-trigger || true
+else
+    kexec -e
+fi

I disabled kexec for bare metal, because I couldn't get the framebuffer to work reliably after kexecing from 32-bit 4.14-openela to 64-bit 6.12-gnu. The other change activates the swapfile, and unmounts /tmp before starting the Guix bootstrap - this was needed to avoid going OOM on my 8GiB RAM bootstrap rig.

@vxtls
Copy link
Copy Markdown
Author

vxtls commented Apr 19, 2026

I’m also thinking about running a bare-metal setup. Is using Coreboot + SeaBIOS on a modern motherboard a viable option?

@Googulator
Copy link
Copy Markdown
Collaborator

Googulator commented Apr 19, 2026

If you can run Coreboot on your board, it should be, especially since qemu's default BIOS implementation is SeaBIOS.

Note that I had to edit that patch in that previous comment, as I got Bash's conditional syntax wrong (it's if...fi, not if...endif).

@vxtls
Copy link
Copy Markdown
Author

vxtls commented Apr 19, 2026

Great, I’ll add the patch shortly.
I’m currently experimenting with an offline ISO image (I’ll keep the default build using substitutes in the project), but I’ll include a configuration file that allows for building an image suitable for offline installation. I’m running into some issues at the moment.

@vxtls
Copy link
Copy Markdown
Author

vxtls commented Apr 20, 2026

There is currently no reliable method for including Closure in an ISO image and performing a guided installation.

@Googulator
Copy link
Copy Markdown
Collaborator

The new patch is malformed: @@ -211,11 +211,8 @@ should be @@ -211,11 +211,4 @@.

With that fixed, I had a test failure in wmin69qfszbjz0mflj2sya8sm2r5c7bs-glib-2.83.3.drv (GLib 2.83.3), that went away on retry. Maybe we should disable tests for it?

@Googulator
Copy link
Copy Markdown
Collaborator

OK, after lots of wrangling...

I have an ISO.

More to come tomorrow. I've just came home, and am quite tired. And it did took quite some wrangling to get here.

@Googulator
Copy link
Copy Markdown
Collaborator

So, finally got around to writing it all up.

  1. I encountered the system Guile leakage issue again. Adding . /var/guix/profiles/per-user/root/current-guix/etc/profile before calling guix system image fixed this, but exposed a new issue: environment variables on the host system point to 32-bit libraries, incompatible with the 64-bit Guile in the current Guix profile. To fix this, the guix system image needs to be appropriately prefixed: env -i PATH="$PATH" guix system image ...
  2. Since https://guix.gnu.org/en/blog/2025/privilege-escalation-vulnerabilities-2025/, Guix requires slirp4netns and a working /dev/net/tun to run custom fixed output derivations (i.e. download scripts). These aren't currently bootstrapped before building Guix. I have been able to build slirp4netns using Guix itself (i.e. running guix system image in a guix shell that includes the slirp4netns package), but simply creating /dev/net/tun using mknod proved insufficient - likely some kernel config change is needed. I suggest either using the kernel config from Guix's own kernel package (which is known to work well for this use case), or just reverting the 6 commits for the Guix daemon running in live-bootstrap (but not the actual Guix channel repo!). For now, I used --disable-chroot as a workaround for the affected packages - unfortunately this causes other packages to break, so I had to keep switching chroot on and off manually during the build.
  3. Guix tests would fail, so I disabled them. Looking through the build log, there are 3 test failures:
    3.1. tests/channels.scm:501 (latest-channel-instances, missing introduction for "guix") fails because of the hack to disable channel authentication. We should either disable / XFAIL this test, or add authentication information to the channel repository we create.
    3.2. tests/graph.scm:207 ("bad DAG") seems to depend on having real %bootstrap-inputs (that is, the original bootstrap binaries downloaded from Guix's servers) - disable or patch it.
    3.3. tests/guix-environment fails because it's testing if the 32-bit bootstrap-guile has a %host-type beginning with "i686". In a seemingly musl-specific issue, our bootstrap-guile ends up with a %host-type value of "x86_64-pc-linux-muslx32", as the build environment seemingly detects that the kernel is 64-bit under a 32-bit userland, and Guix then thinks this means it's inappropriately picking a 64-bit bootstrap-guile for a 32-bit build, and fails the test. Ideally, we should fix this in bootstrap-guile itself.

And then, some bad news:

  1. The generated ISO self-identifies as "Guix 1.5.0rc1", unlike the official ISO, which says "Guix 1.5.0" - this seems to be a Guix tarball packaging error or quirk; probably we are missing a few commits from the official Guix Git repository that actually bring the included Guix package up to the real version 1.5.0.
  2. The generated ISO takes a long time to boot (at least in a Gen 1 Hyper-V VM, where I tried it), and then fails the installation during the disk partitioning stage. I haven't tried a manual installation using guix system, only the semi-graphical installer.

@vxtls
Copy link
Copy Markdown
Author

vxtls commented May 4, 2026

Could you tell in detail why the disk partitioning failed? I'm trying to use a non-default ISO for the setup, specifically, a custom ISO that defines as few components as possible, and I'm adding the --no-substitutes option during the final installation (because by default, the Guix installation image downloads external binaries, whereas I want to achieve a truly source-only installation).

@Googulator
Copy link
Copy Markdown
Collaborator

dump.2026-04-28.10.12.04.tar.gz

This is the dump I was able to obtain from the failed installation. Looks like maybe the target device isn't being found.

With that said, I suspect the failure might be due to some packages being built with chroot off, resulting in contamination from the live-bootstrap host environment.

@Googulator
Copy link
Copy Markdown
Collaborator

I tried the Manual partitioning option in Setup, only to get an empty list of potential disks to partition.

@vxtls
Copy link
Copy Markdown
Author

vxtls commented May 4, 2026

what lsblk show in terminal?

@Googulator
Copy link
Copy Markdown
Collaborator

Looks like I was dumb - the mounted virtual disk had a malformed partition table on it, which confused Setup. A parted mklabel gpt later, guided partitioning succeeds.

@Googulator
Copy link
Copy Markdown
Collaborator

Googulator commented May 4, 2026

image

Looks like it's looking for the local channel repository inside the ISO. Cloning it should work, though.

EDIT: no git in the image, but scp -rq worked. Installation proceeding normally (using upstream substitutes this time, but I reckon it should work without them, as well).

EDIT 2: it's not actually using substitutes; while it did look for them, what it's actually downloading is all source code, because the bootstrap changes made the upstream substitutes not match. Using the L-B system as a substitute server remains a possibility.

@vxtls
Copy link
Copy Markdown
Author

vxtls commented May 7, 2026

Compiling Scheme modules...
make  check-TESTS check-local
make[3]: Entering directory '/tmp/guix-build-guix-1.5.0rc1.drv-0/source'
make[4]: Entering directory '/tmp/guix-build-guix-1.5.0rc1.drv-0/source'
PASS: tests/accounts.scm
PASS: tests/base16.scm
PASS: tests/base32.scm
PASS: tests/base64.scm
PASS: tests/boot-parameters.scm
PASS: tests/bournish.scm
SKIP: tests/builders.scm
SKIP: tests/build-emacs-utils.scm
FAIL: tests/build-utils.scm
PASS: tests/cache.scm
FAIL: tests/challenge.scm
SKIP: tests/channels.scm
PASS: tests/combinators.scm
SKIP: tests/containers.scm
SKIP: tests/cpio.scm
PASS: tests/cve.scm
SKIP: tests/debug-link.scm
make[4]: *** [Makefile:7302: tests/derivations.log] Error 1
make[4]: Leaving directory '/tmp/guix-build-guix-1.5.0rc1.drv-0/source'
make[3]: *** [Makefile:7285: check-TESTS] Error 2
make[3]: Leaving directory '/tmp/guix-build-guix-1.5.0rc1.drv-0/source'
make[2]: *** [Makefile:7533: check-am] Error 2
make[2]: Leaving directory '/tmp/guix-build-guix-1.5.0rc1.drv-0/source'
make[1]: *** [Makefile:7036: check-recursive] Error 1
make[1]: Leaving directory '/tmp/guix-build-guix-1.5.0rc1.drv-0/source'
make: *** [Makefile:7535: check] Error 2

Test suite failed, dumping logs.
error: in phase 'check': uncaught exception:
%exception #<&invoke-error program: "make" arguments: ("check") exit-status: 2 term-signal: #f stop-signal: #f>
phase `check' failed after 27.4 seconds
command "make" "check" failed with status 2
build process 12 exited with status 256
bash-5.3#

@Googulator
Copy link
Copy Markdown
Collaborator

Not sure if it was the exact same one, but I did occasionally see a strange failure in the check phase of guix-1.5.0rc1, that wasn't a test failure (which should move on to other tests), but something that instantly interrupted execution of the test suite. It's nondeterministic, and simply retrying helps.

@vxtls
Copy link
Copy Markdown
Author

vxtls commented May 7, 2026

It's nondeterministic, and simply retrying helps.

I suppose part of the concept of reproducible builds is consistency in behavior. I think that if we have time, we should look into the specific cause.

As for tests that fail consistently, what I want to do is fix those failing tests. First, I’ll look into why they’re failing, whether it’s because the hashes don’t match or some other reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants