ADR 0013 — Distributed inference topology: what AR sequentiality allows by FluffyAIcode · Pull Request #114 · FluffyAIcode/Kakeya-LLM-Inference-engine

FluffyAIcode · 2026-06-13T03:45:46Z

What

New ADR docs/adr/0013-distributed-inference-topology.md (+ README index row) that fixes the can/can't-parallelize conclusion for distributed inference, so it isn't re-derived each time the "split one task across many verifiers" idea resurfaces.

Decision recorded

Governing constraint: single-sequence AR decode is sequential (token N+1 depends on N) → a sequence's token chain cannot be split across independent parallel verifiers. That's causal dependency, not an engineering gap.

Topologies, mapped to feasibility:

Topology	Realizable?	Notes
Split one sequence across N independent verifiers	❌ No	category error (AR sequentiality)
Single big verifier sharded across hosts	✅ Yes	tensor/pipeline parallel (`mlx.distributed model.shard`)
N proposers : 1 verifier (tree/multi-candidate)	✅ Yes — the single-request throughput path	feasible on ADR-0009 substrate, not built (current loop = single `RemoteProposer` + linear accept)
1 proposer : N verifiers	✅ Realized	fleet multi-tenancy throughput, not single-task speedup

Plus the two throughput regimes (single-request vs fleet-aggregate), the F3-latency caveat for multi-host spec-decode, and the explicit note that the Mac bridge is the tool plane, not a production data plane. Cites results/research/verify_l_sweep.json (sublinear verify(L) = the headroom N:1 tree-spec would exploit).

On ADR 0009

I did not amend ADR 0009 — it's Accepted (immutable except Status per the README convention) and lives on the v05/b876 branch (that thread's territory). ADR 0013 is its topology companion/clarification instead, per the "resolve by writing a new ADR" convention. Cross-links ADR 0001/0008/0009/0012.

Notes

Status Accepted; doc-only.
Stacked on the ADR-0012 branch (PR ADR 0012 — Proposer/verifier value proposition (bounded-memory + recall; platform-forked throughput) #113) so the README index rows compose without conflict; rebases onto main after ADR 0012 — Proposer/verifier value proposition (bounded-memory + recall; platform-forked throughput) #113 merges.

Testing

Documentation-only — no code paths affected.

Fix the can/can't-parallelize conclusion so it is not re-derived. Governing constraint: single-sequence AR decode is sequential (token N+1 depends on N) -> a sequence's token chain cannot be split across independent parallel verifiers. Topologies mapped: - split one sequence across N independent verifiers: NOT possible (AR seq). - single big verifier sharded across hosts: yes (mlx.distributed model.shard). - N proposers : 1 verifier (tree/multi-candidate spec): the path to single- request throughput; feasible on ADR 0009 substrate but NOT built (current loop is single RemoteProposer + linear accept). - 1 proposer : N verifiers: realized, but = fleet multi-tenancy throughput, not single-task speedup. Companion/clarification to ADR 0009 (not an amendment - 0009 is Accepted and on another branch). Cross-links ADR 0001/0008/0009/0012; cites verify_l_sweep.json (sublinear verify = the tree-spec headroom). README index updated. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR 0013 — Distributed inference topology: what AR sequentiality allows#114

ADR 0013 — Distributed inference topology: what AR sequentiality allows#114
FluffyAIcode wants to merge 1 commit into
AgentMemory/adr-0012-value-proposition-2815from
AgentMemory/adr-0013-distributed-topology-2815

FluffyAIcode commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FluffyAIcode commented Jun 13, 2026

What

Decision recorded

On ADR 0009

Notes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants