Skip to content

peer ipc hygiene#63

Open
dlipicar wants to merge 3 commits into
masterfrom
claude/peer-ipc-hygiene
Open

peer ipc hygiene#63
dlipicar wants to merge 3 commits into
masterfrom
claude/peer-ipc-hygiene

Conversation

@dlipicar
Copy link
Copy Markdown
Contributor

No description provided.

dlipicar and others added 3 commits May 20, 2026 14:51
`RemoteTransportHost::unpublishObject` was a no-op even though
`LogosAPIProvider` calls it during teardown — the QRO source-side
state on the host's `QRemoteObjectRegistryHost` lived forever once
published, leaving in-flight signals/replies dispatching through
torn-down peer connections. One half of the hygiene gap the
logos_host SIGBUS investigation surfaced (the other half lives in
logos-capability-module's outbound informModuleToken on a possibly-
dead peer).

Track published objects in a name → QPointer<QObject> map so
unpublishObject can look up the right QObject* and call
disableRemoting() on it. QPointer guards against the rare case
where the QObject is destroyed by its owner before unpublish runs
(QRO's own destroyed() hook handles that path; we just need to
not dereference a dangling pointer to call disableRemoting).

This is a hygiene improvement, not a complete fix for the underlying
Qt-Remote-Objects mid-disconnect serialiser race that started the
SIGBUS — that race lives in QRO's CodecBase::send walking parameter
metatypes while a peer's socket is being torn down on the same node.
Cleaning up the source side promptly shrinks the window in which
that race can fire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every outbound RPC in the SDK funnels through one of four methods on
LogosAPIConsumer: invokeRemoteMethod (sync), invokeRemoteMethodAsync,
informModuleToken, informModuleToken_module. All four call
m_transport->requestObject(...) then dispatch on the returned
LogosObject* — bypassing the existing isConnected() check inside
LogosAPIConsumer::requestObject's wrapper.

A synchronous outbound on a transport whose peer is mid-teardown is
the riskiest moment in the codebase. Observed as a Qt-Remote-Objects
source-codec race that SIGBUS'd the entire logos_host when
capability_module talked to a ui-host child that had just received
SIGTERM (CodecBase::send dereferencing an invalidated metatype fn
pointer during source-side teardown).

Adds a cached-state isConnected() check at the top of each of those
four methods, before any transport touch. On a known-disconnected
peer the call short-circuits with a structured failure (default
QVariant for sync, callback fires with default QVariant on the next
event-loop turn for async, false for informModuleToken*) — strictly
better than crashing the daemon.

isConnected() is documented on the LogosTransportConnection base
class as a hot-path contract: implementations MUST return cached
state and be O(1). All four current transports (qt_remote, plain,
qt_local, mock) already satisfy this — single field reads or
literal true.

This replaces the per-call-site guard previously added in
logos-capability-module's CapabilityModuleImpl::requestModule.
Every existing and future outbound RPC path now benefits without
each call site having to remember to guard.

The race is not eliminated — the socket can still tear down between
the check and the wire write — but the window where the peer is
already known-gone (the most common case) is closed. A proper fix
upstream in Qt-Remote-Objects (CodecBase::send checking device
validity before dereferencing the metatype table) would close the
remaining sliver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 20, 2026 18:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a central “peer-liveness guard” to prevent outbound IPC/RPC calls from touching a transport that is already known-disconnected, and adds test and documentation coverage for the behavior. It also improves Qt Remote Objects host hygiene by making unpublishObject() actually disable remoting for previously published objects, and extends the mock transport so tests can force a disconnected state.

Changes:

  • Add a transport isConnected() short-circuit in LogosAPIConsumer for sync/async invokes and token propagation calls.
  • Extend mock infrastructure (MockStore / LogosMockSetup) to simulate disconnect/reconnect, and add a dedicated unit test suite covering the guard.
  • Improve qt_remote host-side cleanup by tracking published objects and calling disableRemoting() in unpublishObject(); document the new guard and the isConnected() hot-path contract.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cpp/logos_api_consumer.cpp Adds centralized peer-liveness checks to short-circuit outbound calls when transport is disconnected.
cpp/logos_transport.h Documents the hot-path/O(1) contract for LogosTransportConnection::isConnected().
cpp/implementations/mock/mock_transport.cpp Makes mock isConnected() reflect MockStore state for guard testing.
cpp/implementations/mock/mock_store.h Adds atomic mocked connection state and API to flip it.
cpp/implementations/mock/mock_store.cpp Implements isConnected()/setConnected() and resets connection state on reset().
cpp/implementations/mock/logos_mock.h Exposes disconnect()/reconnect() helpers for tests via LogosMockSetup.
cpp/implementations/qt_remote/remote_transport.h Adds bookkeeping for published objects to support proper unpublish.
cpp/implementations/qt_remote/remote_transport.cpp Implements real unpublishObject() by disabling remoting on the tracked object.
tests/sdk/test_peer_liveness_guard.cpp New tests validating guard behavior across sync/async invoke and token methods.
tests/sdk/CMakeLists.txt Adds the new test file to the SDK test target.
docs/docs.md Documents the peer-liveness guard behavior and transport expectations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cpp/logos_transport.h
Comment on lines +67 to 79
* @brief Check if currently connected.
*
* Hot-path contract: implementations MUST return cached state and
* be O(1). This is called from LogosAPIConsumer on every outbound
* RPC (sync + async invoke, informModuleToken{,_module}) as a
* peer-liveness guard to short-circuit calls to a transport whose
* peer is already known-gone. A blocking probe here would tax
* every cross-process call — never do a syscall, never do a
* round-trip, never do anything that can take a lock contended
* with the I/O thread. The expected shape is "return a bool field
* (or atomic load) that the transport's own completion handlers
* flip when the underlying socket signals an error."
*/
Comment thread docs/docs.md
Comment on lines +365 to +374
**Transport contract.** `LogosTransportConnection::isConnected()` is declared (in `logos_transport.h`) as a hot‑path call: implementations **must** return cached state and be O(1). No syscalls, no round‑trips, no contention with the I/O thread. The expected shape is "return a bool field (or atomic load) that the transport's own completion handlers flip when the underlying socket signals an error." All four current implementations satisfy this:

| Transport | Implementation of `isConnected()` |
| ------------- | ------------------------------------------------------------------ |
| `qt_remote` | `return m_connected;` — bool field updated by QRO connect events |
| `plain` | `m_connected && m_conn && m_conn->isOpen()` — atomic<bool> load |
| `qt_local` | `return true;` — in‑process registry, never disconnected |
| `mock` | reads `MockStore::isConnected()` (atomic, default true) |

The `plain` transport goes further: its `RpcConnection::sendCall` self‑gates internally (returns a `TRANSPORT_CLOSED` promise on `m_stopped`), so the guard at the SDK layer is belt‑and‑suspenders for it. For the `qt_remote` transport, the guard is load‑bearing — there is no internal self‑gate inside Qt Remote Objects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants