peer ipc hygiene#63
Open
dlipicar wants to merge 3 commits into
Open
Conversation
`RemoteTransportHost::unpublishObject` was a no-op even though `LogosAPIProvider` calls it during teardown — the QRO source-side state on the host's `QRemoteObjectRegistryHost` lived forever once published, leaving in-flight signals/replies dispatching through torn-down peer connections. One half of the hygiene gap the logos_host SIGBUS investigation surfaced (the other half lives in logos-capability-module's outbound informModuleToken on a possibly- dead peer). Track published objects in a name → QPointer<QObject> map so unpublishObject can look up the right QObject* and call disableRemoting() on it. QPointer guards against the rare case where the QObject is destroyed by its owner before unpublish runs (QRO's own destroyed() hook handles that path; we just need to not dereference a dangling pointer to call disableRemoting). This is a hygiene improvement, not a complete fix for the underlying Qt-Remote-Objects mid-disconnect serialiser race that started the SIGBUS — that race lives in QRO's CodecBase::send walking parameter metatypes while a peer's socket is being torn down on the same node. Cleaning up the source side promptly shrinks the window in which that race can fire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every outbound RPC in the SDK funnels through one of four methods on LogosAPIConsumer: invokeRemoteMethod (sync), invokeRemoteMethodAsync, informModuleToken, informModuleToken_module. All four call m_transport->requestObject(...) then dispatch on the returned LogosObject* — bypassing the existing isConnected() check inside LogosAPIConsumer::requestObject's wrapper. A synchronous outbound on a transport whose peer is mid-teardown is the riskiest moment in the codebase. Observed as a Qt-Remote-Objects source-codec race that SIGBUS'd the entire logos_host when capability_module talked to a ui-host child that had just received SIGTERM (CodecBase::send dereferencing an invalidated metatype fn pointer during source-side teardown). Adds a cached-state isConnected() check at the top of each of those four methods, before any transport touch. On a known-disconnected peer the call short-circuits with a structured failure (default QVariant for sync, callback fires with default QVariant on the next event-loop turn for async, false for informModuleToken*) — strictly better than crashing the daemon. isConnected() is documented on the LogosTransportConnection base class as a hot-path contract: implementations MUST return cached state and be O(1). All four current transports (qt_remote, plain, qt_local, mock) already satisfy this — single field reads or literal true. This replaces the per-call-site guard previously added in logos-capability-module's CapabilityModuleImpl::requestModule. Every existing and future outbound RPC path now benefits without each call site having to remember to guard. The race is not eliminated — the socket can still tear down between the check and the wire write — but the window where the peer is already known-gone (the most common case) is closed. A proper fix upstream in Qt-Remote-Objects (CodecBase::send checking device validity before dereferencing the metatype table) would close the remaining sliver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a central “peer-liveness guard” to prevent outbound IPC/RPC calls from touching a transport that is already known-disconnected, and adds test and documentation coverage for the behavior. It also improves Qt Remote Objects host hygiene by making unpublishObject() actually disable remoting for previously published objects, and extends the mock transport so tests can force a disconnected state.
Changes:
- Add a transport
isConnected()short-circuit inLogosAPIConsumerfor sync/async invokes and token propagation calls. - Extend mock infrastructure (
MockStore/LogosMockSetup) to simulate disconnect/reconnect, and add a dedicated unit test suite covering the guard. - Improve
qt_remotehost-side cleanup by tracking published objects and callingdisableRemoting()inunpublishObject(); document the new guard and theisConnected()hot-path contract.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
cpp/logos_api_consumer.cpp |
Adds centralized peer-liveness checks to short-circuit outbound calls when transport is disconnected. |
cpp/logos_transport.h |
Documents the hot-path/O(1) contract for LogosTransportConnection::isConnected(). |
cpp/implementations/mock/mock_transport.cpp |
Makes mock isConnected() reflect MockStore state for guard testing. |
cpp/implementations/mock/mock_store.h |
Adds atomic mocked connection state and API to flip it. |
cpp/implementations/mock/mock_store.cpp |
Implements isConnected()/setConnected() and resets connection state on reset(). |
cpp/implementations/mock/logos_mock.h |
Exposes disconnect()/reconnect() helpers for tests via LogosMockSetup. |
cpp/implementations/qt_remote/remote_transport.h |
Adds bookkeeping for published objects to support proper unpublish. |
cpp/implementations/qt_remote/remote_transport.cpp |
Implements real unpublishObject() by disabling remoting on the tracked object. |
tests/sdk/test_peer_liveness_guard.cpp |
New tests validating guard behavior across sync/async invoke and token methods. |
tests/sdk/CMakeLists.txt |
Adds the new test file to the SDK test target. |
docs/docs.md |
Documents the peer-liveness guard behavior and transport expectations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+67
to
79
| * @brief Check if currently connected. | ||
| * | ||
| * Hot-path contract: implementations MUST return cached state and | ||
| * be O(1). This is called from LogosAPIConsumer on every outbound | ||
| * RPC (sync + async invoke, informModuleToken{,_module}) as a | ||
| * peer-liveness guard to short-circuit calls to a transport whose | ||
| * peer is already known-gone. A blocking probe here would tax | ||
| * every cross-process call — never do a syscall, never do a | ||
| * round-trip, never do anything that can take a lock contended | ||
| * with the I/O thread. The expected shape is "return a bool field | ||
| * (or atomic load) that the transport's own completion handlers | ||
| * flip when the underlying socket signals an error." | ||
| */ |
Comment on lines
+365
to
+374
| **Transport contract.** `LogosTransportConnection::isConnected()` is declared (in `logos_transport.h`) as a hot‑path call: implementations **must** return cached state and be O(1). No syscalls, no round‑trips, no contention with the I/O thread. The expected shape is "return a bool field (or atomic load) that the transport's own completion handlers flip when the underlying socket signals an error." All four current implementations satisfy this: | ||
|
|
||
| | Transport | Implementation of `isConnected()` | | ||
| | ------------- | ------------------------------------------------------------------ | | ||
| | `qt_remote` | `return m_connected;` — bool field updated by QRO connect events | | ||
| | `plain` | `m_connected && m_conn && m_conn->isOpen()` — atomic<bool> load | | ||
| | `qt_local` | `return true;` — in‑process registry, never disconnected | | ||
| | `mock` | reads `MockStore::isConnected()` (atomic, default true) | | ||
|
|
||
| The `plain` transport goes further: its `RpcConnection::sendCall` self‑gates internally (returns a `TRANSPORT_CLOSED` promise on `m_stopped`), so the guard at the SDK layer is belt‑and‑suspenders for it. For the `qt_remote` transport, the guard is load‑bearing — there is no internal self‑gate inside Qt Remote Objects. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.