Skip to content

fix(src): prevent boo ui freeze from daemon write deadlock#89

Merged
kylecarbs merged 2 commits into
mainfrom
fix/ui-freeze-nonblocking-daemon-writes
Jun 24, 2026
Merged

fix(src): prevent boo ui freeze from daemon write deadlock#89
kylecarbs merged 2 commits into
mainfrom
fix/ui-freeze-nonblocking-daemon-writes

Conversation

@kylecarbs

@kylecarbs kylecarbs commented Jun 23, 2026

Copy link
Copy Markdown
Member

Fixes #85.

Summary

boo ui froze when opening helix/nvim in a freshly created session. The root cause is a daemon-side blocking-write deadlock, now confirmed on Linux with kernel evidence from a live frozen instance (below).

Root cause (confirmed)

The session daemon (src/daemon.zig) runs a single-threaded poll(2) loop and wrote to clients with blocking writes (conn.send -> protocol.writeMsg -> writeAll -> posix.write). Meanwhile boo ui does synchronous, un-timed client.control("info") round-trips for every session every 250ms (refreshSessions), with a blocking read and no deadline.

When a session produces a large startup repaint (helix/nvim on an 85x144 viewport, x3 UIs), the daemon's socket send buffer to a ui fills before the ui drains it, and the daemon blocks in write(). That ui is simultaneously blocked reading an info reply from the same daemon. Neither side makes progress: a mutual deadlock. boo ls hangs for the same reason (it infos every session).

Kernel evidence from the reporter's live frozen instance (NixOS):

process state syscall / wchan
daemon (pid 2444987) D/S write(fd 7) -> unix_stream_sendmsg -> sock_alloc_send_pskb (send buffer full, peer not draining)
boo ui x3 S read(...) -> unix_stream_recvmsg (the info round-trip)

This did not reproduce with small test viewports (tiny repaints). The real setup had large viewports and three concurrent UIs, which is enough to fill the send buffer.

The fix

Two changes; the first removes the deadlock, the second is a client-side safety net.

1. Non-blocking daemon writes (src/daemon.zig, src/protocol.zig)

  • Accepted client sockets are O_NONBLOCK.
  • conn.send queues frames (protocol.appendMsg) and flushes what the socket accepts now; the remainder goes out on POLLOUT.
  • A client that backs up past a generous cap (8 MiB) is dropped (it reconnects and gets a fresh repaint) instead of being buffered without bound.
  • Final frames (detach, exit, replies) drain through a bounded shutdown flush.
  • sweep recomputes passthrough after dropping a client so the window answers its own queries once no client remains.

The daemon never blocks on a slow client again; it keeps draining the PTY and answering control commands.

2. Time-bound boo ui control round-trips (src/client.zig, src/main.zig, src/ui.zig)

  • client.control gains an optional timeout_ms: it polls the socket with a deadline before each read and returns error.Timeout if no reply arrives.
  • The ui passes a 3s control_timeout_ms for its sidebar polls (info, cwd, rename, quit); the CLI passes null to preserve its blocking semantics (boo wait already polls client-side).
  • A timeout never deletes the socket: a slow daemon is not a dead one, so sessionInfo and killSession keep the session rather than orphaning a live one.

Even if a daemon ever stops answering for another reason, the ui now gives up and stays interactive instead of freezing.

Testing

  • zig build test-all in Debug and -Doptimize=ReleaseSafe: 214/214 pass, including 3 new unit tests (non-blocking queue/flush order, cap-drop, and control timeout).
  • zig fmt --check clean.
  • Smoke test: boo new -d, ls, peek, kill all work.
Implementation notes
  • protocol.appendMsg frames a message into a growable buffer (the queue), parallel to writeMsg.
  • Conn gains out (queue), out_cap, and a shutdown flag so final frames drain before close; flush does partial, non-blocking writes and compacts the buffer; drop discards and marks the conn dead.
  • The loop registers POLLOUT only when output is queued and flushes before handling input.
  • drainOutbound gives queued finals a bounded (250ms) chance to land at teardown.
  • client.control's deadline uses poll() per read; the new test binds a listening unix socket that never accept()s, so the reply read blocks and must time out.

Generated by Coder Agents on behalf of @kylecarbs.

The daemon ran a single-threaded poll loop and wrote to the attached
client with blocking writes (conn.send -> protocol.writeMsg -> writeAll).
While blocked on a client write it stopped reading the PTY and stopped
servicing control connections. boo ui drains its view socket only between
its own work and, every 250ms, makes synchronous info round-trips to
every session. A freshly started full-screen app (helix, neovim) bursts
output on attach: the daemon's write to the ui backed up at the same
moment the ui blocked on an info round-trip to that daemon, and each
waited on the other. boo froze.

Make per-connection output non-blocking instead. Accepted sockets are
O_NONBLOCK; conn.send queues frames and flushes what the socket takes
now, with the remainder going out on POLLOUT. A client that backs up
past a generous cap is dropped (it reconnects and gets a fresh repaint)
rather than buffered without bound or allowed to wedge the loop. Final
frames (detach, exit, replies) drain through a bounded shutdown flush,
and sweep recomputes passthrough so the window answers its own queries
once a dropped client is gone.

Fixes #85.
@kylecarbs kylecarbs changed the title fix(src): keep the session daemon responsive to slow clients fix(src): make session daemon client writes non-blocking (hardening; possible #85) Jun 23, 2026
@kylecarbs kylecarbs marked this pull request as draft June 23, 2026 15:01
The sidebar polls every session over its control socket on a 250ms
cadence (info) and also issues cwd/rename/quit. These reads had no
deadline, so a daemon that stopped answering could hang the ui
indefinitely. The non-blocking daemon writes already break that
deadlock; this adds a client-side safety net so a stuck daemon can
never freeze the ui again.

client.control gains an optional timeout_ms: it polls the socket with a
deadline before each read and returns error.Timeout if no reply
arrives. The ui passes control_timeout_ms (3s); the CLI passes null to
preserve its blocking semantics (boo wait already polls client-side). A
timeout never deletes the socket, since a slow daemon is not a dead
one, so sessionInfo and killSession keep the session instead of
orphaning a live one.
@kylecarbs kylecarbs changed the title fix(src): make session daemon client writes non-blocking (hardening; possible #85) fix(src): prevent boo ui freeze from daemon write deadlock Jun 24, 2026
@kylecarbs kylecarbs marked this pull request as ready for review June 24, 2026 02:53
@kylecarbs kylecarbs merged commit aa2edb8 into main Jun 24, 2026
5 checks passed
@kylecarbs kylecarbs mentioned this pull request Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Opening helix or neovim in boo ui freezes boo

1 participant