Skip to content

Peers on same coordinator cannot see each other — mesh state not syncing across harnesses #13

@claguna-venflow

Description

@claguna-venflow

Two agents (Pi and Claude Code) connect to the same coordinator on localhost:19876, both register successfully, but neither appears in the other's list_agents.
Both see only themselves in the shared room.

Environment

  • OS: macOS
  • agent-comms version: 1.19.5 (both sides)
  • Harness A: Pi (npm: agent-comms@1.19.5)
  • Harness B: Claude Code (plugin: agent-comms@1.19.5)

Steps to Reproduce

  1. Install agent-comms 1.19.5 in both Pi and Claude Code
  2. Start Claude Code — it binds :19876 and becomes coordinator (PID 91611)
  3. Start Pi — it connects to existing coordinator (PID 91611)
  4. In Claude: agent_comms({ action: "register", name: "venflxapp-claude", visibility: "visible", tags: ["venflxapp"] })
  5. In Claude: agent_comms({ action: "join_room", room: "venflxapp" })
  6. In Pi: agent_comms({ action: "register", name: "venflxapp-pi", visibility: "visible", tags: ["venflxapp"] })
  7. In Pi: agent_comms({ action: "join_room", room: "venflxapp" })
  8. In Pi: agent_comms({ action: "list_agents" })

Expected Behavior

Pi should see venflxapp-claude in list_agents. Both should see 2 members in venflxapp room.

Actual Behavior

  • Pi sees only Pi agents (multiple ghost sessions), never Claude
  • Claude sees only itself, never Pi
  • Both see venflxapp room with 1 member (themselves)
  • TCP connections are ESTABLISHED on same coordinator PID:

lsof -i :19876

COMMAND PID USER FD TYPE NODE NAME
node 91611 ... 61u IPv4 ... TCP localhost:19876 (LISTEN) ← Claude = coordinator
node 40133 ... 56u IPv4 ... TCP 57006->19876 (ESTABLISHED) ← Pi peer connection
node 91611 ... 73u IPv4 ... TCP 19876->57006 (ESTABLISHED) ← Claude accepting Pi

What we verified (not the cause)

  • Not a version mismatch — both updated to 1.19.5
  • Not a port conflict — single coordinator PID 91611
  • Not failed TCPESTABLISHED connections visible in lsof
  • Not failed registration — both receive "Registered as ..." success responses
  • Not visibility settings — both use visibility: "visible"

Hypothesis

The mesh state synchronization (agent registry, room membership) is not propagating between peers despite successful TCP transport connection. Either:

  1. The state patch broadcast is silently failing
  2. Peers are in separate logical mesh partitions
  3. TLS certificate pinning prevents state exchange even though TCP is connected

Critical observation — Agent ID format mismatch:

Claude Code ID: oYp--AeG (short alphanumeric, random)
Pi ID: 4D:59:5B:E0:0B:CF:D0:C6:CC:1F:57:B1:C5:13:14:21:9E:D7:F5:73:D1:1B:BB:2B:FB:29:F8:57:22:64:47:D3
(32-byte SHA-256 hex with colons — deterministic, hardware-based)

This suggests the two harnesses use different ID generation schemes in their register handshake. If the coordinator validates or partitions agents by ID format, this
would explain the isolated state despite shared TCP transport.

Additional observation — Messages not delivered:

// Claude sent:
agent_comms({ action: "send", target: "venflxapp", content: "..." })
// Response: Sent to venflxapp: 1779932641420-faz9zm

// Pi received: nothing

Room broadcast also fails — not just agent discovery.


Request

Please advise on additional diagnostics we can run, or confirm if this is a known issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions