ADR 0013 — Distributed inference topology: what AR sequentiality allows#114
Draft
FluffyAIcode wants to merge 1 commit into
Draft
Conversation
Fix the can/can't-parallelize conclusion so it is not re-derived. Governing constraint: single-sequence AR decode is sequential (token N+1 depends on N) -> a sequence's token chain cannot be split across independent parallel verifiers. Topologies mapped: - split one sequence across N independent verifiers: NOT possible (AR seq). - single big verifier sharded across hosts: yes (mlx.distributed model.shard). - N proposers : 1 verifier (tree/multi-candidate spec): the path to single- request throughput; feasible on ADR 0009 substrate but NOT built (current loop is single RemoteProposer + linear accept). - 1 proposer : N verifiers: realized, but = fleet multi-tenancy throughput, not single-task speedup. Companion/clarification to ADR 0009 (not an amendment - 0009 is Accepted and on another branch). Cross-links ADR 0001/0008/0009/0012; cites verify_l_sweep.json (sublinear verify = the tree-spec headroom). README index updated. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
New ADR
docs/adr/0013-distributed-inference-topology.md(+ README index row) that fixes the can/can't-parallelize conclusion for distributed inference, so it isn't re-derived each time the "split one task across many verifiers" idea resurfaces.Decision recorded
Governing constraint: single-sequence AR decode is sequential (token
N+1depends onN) → a sequence's token chain cannot be split across independent parallel verifiers. That's causal dependency, not an engineering gap.Topologies, mapped to feasibility:
mlx.distributed model.shard)RemoteProposer+ linear accept)Plus the two throughput regimes (single-request vs fleet-aggregate), the F3-latency caveat for multi-host spec-decode, and the explicit note that the Mac bridge is the tool plane, not a production data plane. Cites
results/research/verify_l_sweep.json(sublinearverify(L)= the headroom N:1 tree-spec would exploit).On ADR 0009
I did not amend ADR 0009 — it's
Accepted(immutable except Status per the README convention) and lives on the v05/b876 branch (that thread's territory). ADR 0013 is its topology companion/clarification instead, per the "resolve by writing a new ADR" convention. Cross-links ADR 0001/0008/0009/0012.Notes
mainafter ADR 0012 — Proposer/verifier value proposition (bounded-memory + recall; platform-forked throughput) #113 merges.Testing
Documentation-only — no code paths affected.