Skip to content

ADR 0013 — Distributed inference topology: what AR sequentiality allows#114

Draft
FluffyAIcode wants to merge 1 commit into
AgentMemory/adr-0012-value-proposition-2815from
AgentMemory/adr-0013-distributed-topology-2815
Draft

ADR 0013 — Distributed inference topology: what AR sequentiality allows#114
FluffyAIcode wants to merge 1 commit into
AgentMemory/adr-0012-value-proposition-2815from
AgentMemory/adr-0013-distributed-topology-2815

Conversation

@FluffyAIcode

Copy link
Copy Markdown
Owner

What

New ADR docs/adr/0013-distributed-inference-topology.md (+ README index row) that fixes the can/can't-parallelize conclusion for distributed inference, so it isn't re-derived each time the "split one task across many verifiers" idea resurfaces.

Decision recorded

Governing constraint: single-sequence AR decode is sequential (token N+1 depends on N) → a sequence's token chain cannot be split across independent parallel verifiers. That's causal dependency, not an engineering gap.

Topologies, mapped to feasibility:

Topology Realizable? Notes
Split one sequence across N independent verifiers ❌ No category error (AR sequentiality)
Single big verifier sharded across hosts ✅ Yes tensor/pipeline parallel (mlx.distributed model.shard)
N proposers : 1 verifier (tree/multi-candidate) ✅ Yes — the single-request throughput path feasible on ADR-0009 substrate, not built (current loop = single RemoteProposer + linear accept)
1 proposer : N verifiers ✅ Realized fleet multi-tenancy throughput, not single-task speedup

Plus the two throughput regimes (single-request vs fleet-aggregate), the F3-latency caveat for multi-host spec-decode, and the explicit note that the Mac bridge is the tool plane, not a production data plane. Cites results/research/verify_l_sweep.json (sublinear verify(L) = the headroom N:1 tree-spec would exploit).

On ADR 0009

I did not amend ADR 0009 — it's Accepted (immutable except Status per the README convention) and lives on the v05/b876 branch (that thread's territory). ADR 0013 is its topology companion/clarification instead, per the "resolve by writing a new ADR" convention. Cross-links ADR 0001/0008/0009/0012.

Notes

Testing

Documentation-only — no code paths affected.

Open in Web Open in Cursor 

Fix the can/can't-parallelize conclusion so it is not re-derived. Governing
constraint: single-sequence AR decode is sequential (token N+1 depends on N) ->
a sequence's token chain cannot be split across independent parallel verifiers.

Topologies mapped:
- split one sequence across N independent verifiers: NOT possible (AR seq).
- single big verifier sharded across hosts: yes (mlx.distributed model.shard).
- N proposers : 1 verifier (tree/multi-candidate spec): the path to single-
  request throughput; feasible on ADR 0009 substrate but NOT built (current loop
  is single RemoteProposer + linear accept).
- 1 proposer : N verifiers: realized, but = fleet multi-tenancy throughput, not
  single-task speedup.

Companion/clarification to ADR 0009 (not an amendment - 0009 is Accepted and on
another branch). Cross-links ADR 0001/0008/0009/0012; cites verify_l_sweep.json
(sublinear verify = the tree-spec headroom). README index updated.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants