Skip to content

Drive resolver elicitation over the 2026-07-28 input_required flow#2986

Merged
maxisbey merged 10 commits into
mainfrom
resolver-input-required
Jun 29, 2026
Merged

Drive resolver elicitation over the 2026-07-28 input_required flow#2986
maxisbey merged 10 commits into
mainfrom
resolver-input-required

Conversation

@Kludex

@Kludex Kludex commented Jun 26, 2026

Copy link
Copy Markdown
Member

Summary

#2969 added the Resolve/Elicit resolver dependency-injection API for @mcp.tool() functions, eliciting via a synchronous ctx.elicit() request mid-call - the 2025 model, which doesn't exist at 2026-07-28 (sampling/roots/elicitation are only embedded in InputRequiredResult there).

This PR makes the resolver Elicit[T] path negotiate the transport by protocol version, so the same author-facing code works on both:

  • >= 2026-07-28 - the framework returns an InputRequiredResult carrying the batched questions and resumes when the client retries call_tool(..., input_responses=..., request_state=...). Independent resolvers are asked together in one round; a resolver that depends on another's answer is asked in a later round.
  • <= 2025-11-25 - it keeps the synchronous elicitation/create request mid-call (no regression for current clients).

Author-facing code is unchanged - resolvers still return T / Elicit[T]; the framework drives the multi-round loop. And since Client.call_tool auto-drives InputRequiredResult retries through the client's normal callbacks (SEP-2322, #2998), a client with an elicitation_callback handles resolver questions transparently on either protocol version - several tests here prove that loop end-to-end.

async def confirm_delete(path: str) -> Confirm | Elicit[Confirm]:
    if not list_files(path):
        return Confirm(ok=True)            # empty folder: no question, no round-trip
    return Elicit(f"{path} has files. Delete?", Confirm)

@mcp.tool()
async def delete_folder(path: str, confirm: Annotated[ElicitationResult[Confirm], Resolve(confirm_delete)]) -> str:
    match confirm:
        case AcceptedElicitation(data=Confirm(ok=True)): ...   # delete
        case DeclinedElicitation(): ...                        # user declined the prompt
        case CancelledElicitation(): ...                       # user cancelled

How it works

  • resolve_arguments becomes a resumable DAG walk: it reads ctx.input_responses / ctx.request_state, runs each resolver, and collects any Elicit it can't yet answer into a pending map. If anything is pending, Tool.run returns the InputRequiredResult instead of running the body.
  • Only elicited answers persist in request_state across rounds, keyed by a worker-stable module:qualname wire key, so each question is asked exactly once per logical call and two consumers of one resolver share one question. A resolver that resolves without eliciting is pure and may re-run on each round; an eliciting resolver runs again to consume its answer.
  • Persisted entries store the exact content the client sent (which already passed validation) - encode and restore are the identity on the client's bytes, so no alias or serialization configuration can break the round-trip. Restored entries are re-validated into the resolver's Elicit[T] model; request_state is client-supplied, so an entry that fails validation is treated as no-progress and the question is re-asked (the same stance the state decoder takes for a malformed envelope). A fresh answer whose content doesn't match the requested schema is an explicit tool error - on both transports (the legacy path previously let the raw pydantic ValidationError text reach the client; that change is noted in the migration guide).
  • A return annotation with more than one distinct Elicit arm (-> Elicit[A] | Elicit[B]) raises InvalidSignature at registration, like cycles and unclassifiable parameters: the restore path validates against one statically-known schema, so the ambiguous shape either never converges or silently injects a wrong-typed model when the shapes overlap.
  • The embedded elicitation is gated on the client's declared capability, per the spec's MUST: a client that didn't declare (form) elicitation support gets a -32021 missing-required-client-capability error with the requiredCapabilities payload, instead of questions it can't render.
  • New render_elicitation_schema() helper in elicitation.py (shared by elicit_with_validation and the input_requests rendering). The legacy path also now reports accept-with-no-content explicitly instead of an "unexpected action" error.

Docs & examples

  • The Dependencies tutorial's "Ask when you must" section explains the version-negotiated transport, scopes the run-once guarantee to what is actually true (each question is asked once; an eliciting resolver runs again to consume its answer; a non-eliciting resolver may re-run when a call resumes), and documents the determinism constraint (a question must derive from tool args and earlier answers - per-round generated values like a default_factory id don't bind across rounds). Multi-round-trip requests no longer claims @mcp.tool() has no high-level path to InputRequiredResult - resolver DI is that path now.
  • examples/stories/refund_desk flips from era = "legacy" to era = "dual" - the manifest comment said to do exactly that once this capability landed - so the story runs the full matrix on both protocol generations with identical author code.

Out of scope (follow-ups)

  • request_state is client-trusted here: forged entries can't smuggle unvalidated or wrongly-typed values into tool parameters (restore validation, above), but a client can still fabricate a valid-shaped outcome (e.g. a confirmation it never showed the user) - the same thing a lying client can do by answering the live question, on any transport. What sealing adds is integrity against third-party tampering and replay/cross-request binding; HMAC-sealing request_state with a per-server key is a separate PR (the typescript-sdk staged it the same way).

Tests

tests/server/mcpserver/test_resolve.py: the MRTR loop is tested both manually (per-round wire shapes via client.session.call_tool(..., allow_input_required=True): first-round question shape, independent questions batched into one round, factory closures with distinct wire keys, non-elicitation response) and end-to-end through the client auto-driver (accept/decline/cancel outcomes, dependent chains asked across rounds with each question answered exactly once, deep chains, restored typed models, accept-with-no-content). Hardening tests: forged request_state (non-dict and schema-failing data) re-asks instead of erroring or injecting, declines persist across rounds, unknown wire keys are ignored, aliased and divergent-alias models round-trip, schema-mismatched fresh answers produce the explicit error on both transports, an 11-deep chain trips the client's input_required_max_rounds, multi-arm Elicit annotations are rejected at registration, and the -32021 capability gate. The synchronous legacy path keeps its own legs, and the docs/story suites run both transports. 100% branch coverage.

AI Disclaimer

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 5 files

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread src/mcp/server/mcpserver/resolve.py Outdated
Comment thread src/mcp/server/mcpserver/resolve.py Outdated
Comment thread src/mcp/server/mcpserver/resolve.py Outdated
Comment thread src/mcp/server/mcpserver/resolve.py Outdated
Comment thread src/mcp/server/mcpserver/resolve.py Outdated
Comment thread src/mcp/server/mcpserver/resolve.py Outdated
Comment thread src/mcp/server/mcpserver/resolve.py
Comment thread src/mcp/server/mcpserver/resolve.py Outdated
Comment thread src/mcp/server/mcpserver/resolve.py Outdated
Comment thread src/mcp/server/mcpserver/resolve.py Outdated
Comment thread tests/server/mcpserver/test_resolve.py Outdated
@Kludex

Kludex commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

Reviewer's guide

Stacked on #2969 (review that first). Its base is the resolver-dependency-injection branch, so this diff is only the MRTR delta. #2969 elicits synchronously (ctx.elicit(), works at <= 2025-11-25); this PR makes the same Resolve/Elicit author API work at 2026-07-28, where elicitation only exists embedded in InputRequiredResult (no standalone server-to-client requests).

Where to look

  • src/mcp/server/mcpserver/resolve.py — almost all of it. resolve_arguments becomes a resumable DAG walk; _resolve/_elicit collect pending elicitations; _State/_encode_state/_decode_state are the request_state codec; _state_key/build_resolver_plans assign the wire keys.
  • tools/base.pyTool.run now returns the InputRequiredResult when resolution is incomplete (convert_result already passes it through).
  • context.pyContext.protocol_version (None outside a request, used to pick the transport).

Author-facing code is unchanged — resolvers still return T / Elicit[T]; the framework drives the multi-round loop.

Design decisions worth a second look

  1. request_state is client-trusted. It round-trips through the (untrusted) client carrying the user's elicited answers; a malicious client could forge an "accept." HMAC-sealing is deliberately a follow-up PR — please confirm shipping unsealed first is acceptable. (It's validated for shape via pydantic, just not authenticated.)
  2. Only elicited outcomes persist across rounds. A resolver that resolves without eliciting is pure and re-runs each round (so it must be idempotent). This intentionally narrows "exactly-once" to elicitations — it's what makes non-serializable resolver returns safe and avoids type-loss across the round-trip.
  3. Negotiated dual transport. >= 2026-07-28InputRequiredResult loop; <= 2025-11-25 → synchronous ctx.elicit(). No regression for current clients.
  4. Client loop is caller-driven. Client.call_tool surfaces the InputRequiredResult (with allow_input_required=True); there's no auto-loop driver yet (existing upstream TODO). Tests drive it via a _drive_mrtr helper.

Wire-key correctness (the subtle part): resolvers are deduped in-call by _resolver_key (instance-distinct), but keyed on the wire by a deterministic module:qualname (with #N disambiguation for colliding closures) so keys stay stable across rounds/workers. See _state_key + build_resolver_plans.

Reviewed by cubic + Claude + a bughunter pass + a Codex pass (which caught two real multi-round state bugs, now fixed). CI green, 100% coverage, no casts.

Kludex added a commit that referenced this pull request Jun 26, 2026
…g resolvers

Review follow-ups on #2986:
- _state_key carries no id(...): it is module:qualname (a callable object uses its
  type's), so request_state round-trips and resumes on any worker (stateless HTTP).
  Two resolvers sharing that base (method instances, factory closures) are already
  disambiguated deterministically at registration (#N), so dropping the id is safe.
- An eliciting resolver whose annotation lacks an Elicit[T] arm has elicit_schema
  None; its answer restored from request_state is now re-validated against the live
  Elicit.schema (via _elicit consulting res.state) instead of injecting a raw dict.
- Move the datetime import to module top (AGENTS.md).

Add regression tests: an unannotated eliciting resolver in a multi-round flow, and
worker-stable wire keys for method instances and callable objects.
Comment thread docs/migration.md Outdated
Comment thread tests/server/mcpserver/test_resolve.py Outdated
Comment thread src/mcp/server/mcpserver/tools/base.py
Base automatically changed from resolver-dependency-injection to main June 29, 2026 10:51
Kludex and others added 8 commits June 29, 2026 12:17
Resolvers that return Elicit[T] now negotiate the transport by protocol version:
at >= 2026-07-28 the framework returns an InputRequiredResult carrying the
batched questions and resumes when the client retries with
input_responses/request_state; at <= 2025-11-25 it keeps the synchronous
ctx.elicit() request. Author-facing code (Resolve/Elicit) is unchanged.

resolve_arguments becomes a resumable DAG walk: it reads ctx.input_responses /
ctx.request_state, memoizes resolver outcomes by a process-stable module:qualname
key, batches independent pending elicitations into one round, serializes
dependent ones across rounds, and carries resolved outcomes in request_state so
each resolver resolves once per logical call. Outcomes restored from
request_state are re-validated into their model via the Elicit[T] return arm.
request_state is client-trusted for now (HMAC sealing is a follow-up).

Add a render_elicitation_schema helper to elicitation.py, MRTR-loop and codec
tests, and document the transport in the migration guide.
Replace every cast() with a checked or properly-typed alternative: model the
request_state payload with pydantic (_State/_StateEntry) so the untrusted JSON is
validated instead of cast; type _Resolution.pending as InputRequests so an
ElicitRequest fits without a cast; add a _is_elicit TypeGuard and an _accepted
helper that carry the right types; and narrow req.params via isinstance in the
test helper. No behavior change.
- In-call cache keyed by _resolver_key (instance-distinct) again; _state_key adds
  id(__self__) for the wire key, so two instances of one bound method no longer
  collide and silently share an outcome.
- resolve_arguments reads ctx.protocol_version (new Context property, None outside
  a request) instead of dereferencing request_context, so direct
  MCPServer.call_tool() works for tools whose resolvers never elicit.
- request_state persists only elicited outcomes (always validated models); a
  resolver that resolves without eliciting is pure and re-runs each round. Fixes
  the json.dumps crash on non-serializable returns (datetime/set/...) and the
  dict-degradation of restored values.
- _elicit_return_schema handles a bare Elicit[T] return (not only unions).
- _INPUT_REQUIRED_VERSION pinned to '2026-07-28' instead of LATEST_MODERN_VERSION.
- accept with no content raises ToolError instead of silently reporting cancel.
- Independent nested resolver deps batch into one round (catch _Pending per dep).
- Test cleanup: drop dead helpers, hoist CreateMessageResult import.

Add regression tests for each; document the narrowed elicited-only persistence.
- Carry restored answers forward: an elicited outcome restored from request_state
  is now re-added to res.elicited, so in a 4+-round dependency chain an early
  answer is not dropped from request_state and re-asked on a later round.
- Collision-free wire keys: assign each resolver a deterministic wire key at
  registration (module:qualname, disambiguated with #N when bases collide), so
  two distinct closures from one factory get separate questions/outcomes instead
  of sharing one. _state_key is now only the base-key source at registration.

Add regression tests: a deep chain asserting an early answer is asked once, and
factory closures asserting distinct wire keys and correct injected values.
…g resolvers

Review follow-ups on #2986:
- _state_key carries no id(...): it is module:qualname (a callable object uses its
  type's), so request_state round-trips and resumes on any worker (stateless HTTP).
  Two resolvers sharing that base (method instances, factory closures) are already
  disambiguated deterministically at registration (#N), so dropping the id is safe.
- An eliciting resolver whose annotation lacks an Elicit[T] arm has elicit_schema
  None; its answer restored from request_state is now re-validated against the live
  Elicit.schema (via _elicit consulting res.state) instead of injecting a raw dict.
- Move the datetime import to module top (AGENTS.md).

Add regression tests: an unannotated eliciting resolver in a multi-round flow, and
worker-stable wire keys for method instances and callable objects.
…t driver

- request_state entries are validated unconditionally when the resolver's
  model is known; an entry that fails validation is dropped and the question
  re-asked, matching _decode_state's malformed-means-no-progress stance.
  Accepted models are encoded by alias so the stored shape round-trips
  through the same validation the client's fresh answer passed.
- Recorded state now wins over a re-sent answer on both restore paths
  (previously only resolvers without an Elicit[T] return arm honored a
  changed answer).
- A fresh answer whose content does not match the requested schema raises an
  explicit ToolError instead of leaking pydantic internals.
- Embedded elicitation is gated on the client's declared form-elicitation
  capability: a client that did not declare it gets -32021 with the
  requiredCapabilities payload (the spec says servers MUST NOT send
  inputRequests the client has not declared support for).
- Legacy parity: elicit_with_validation reports accept-with-no-content
  explicitly; the unexpected-action arm is gone (the action literal makes
  the remainder provably cancel).
- uses_input_required -> _uses_input_required (module-internal).
- Tests drive the loop through the real client surfaces - manual wire shapes
  via session.call_tool(..., allow_input_required=True), outcomes end-to-end
  through Client.call_tool's auto-driver - plus regressions for forged
  state, declined-outcome persistence, unknown keys, aliased models,
  schema-mismatched answers and the rounds cap.
The manifest pinned the story to era="legacy" until resolver elicitation
could ride the 2026-07-28 input_required round-trip; that is now the case,
so the story runs the full transport x era matrix with identical author
code. README caveats describe the per-era transport and what persists
across rounds; the spec links point at the real anchor
(#input-required-tool-results), fixing the same stale fragment in the mrtr
story; legacy_elicitation's cross-reference no longer claims resolver DI
rides push elicitation everywhere.
The Dependencies tutorial explains that the framework picks the question's
transport from the negotiated protocol version, scopes the run-once claim
to what is guaranteed (each question asked once; a non-eliciting resolver
may re-run when a call resumes), and documents the determinism constraint
on questions. Multi-round-trip requests no longer claims @mcp.tool() has
no high-level path to InputRequiredResult - resolver dependencies are that
path. The eliciting tutorial snippets are tested on both transports.
@maxisbey maxisbey force-pushed the resolver-input-required branch from a0eb013 to 85756a7 Compare June 29, 2026 12:20
maxisbey added 2 commits June 29, 2026 13:24
… returns

request_state entries now store exactly the content the client sent (which
already passed validation) instead of re-deriving it from the validated model
with model_dump - persistence and restore previously used two codecs that
diverge whenever validation and serialization aliases differ, dropping a
legitimate stored answer on the third round and re-asking forever. Encode and
restore are now the identity on the client's bytes, so no serialization knob
can break the round-trip; the earlier by-alias dump is gone with the rest of
the model-dumping path.

A return annotation with more than one distinct Elicit arm now raises
InvalidSignature at registration: the restore path validates against the
single statically-known schema, so `-> Elicit[A] | Elicit[B]` either never
converges (the stored B answer fails A validation each round) or silently
injects a wrong-typed model when the shapes happen to overlap. Like cycles
and unclassifiable parameters, the ambiguous shape is rejected up front.

Legacy parity: elicit_with_validation wraps a schema-mismatched accepted
answer in the same stable error the 2026 path uses, instead of letting the
raw pydantic ValidationError text reach the client; noted in the migration
guide since callers catching ValidationError see ValueError now.
Each question is asked exactly once per call, but on the multi-round-trip
form an eliciting resolver's body runs again to consume its answer, and a
resolver that answered without asking may run again whenever the call
resumes. The tutorial info box, its recap bullet, and the refund_desk
caveats now say exactly that, so authors don't hang side effects on a
runs-at-most-once reading that only holds for the synchronous form.
@maxisbey maxisbey merged commit c85836a into main Jun 29, 2026
34 checks passed
@maxisbey maxisbey deleted the resolver-input-required branch June 29, 2026 13:39
Comment on lines 136 to +138
* `Annotated[T, Resolve(fn)]` on a tool parameter: the SDK runs `fn` and injects its return value.
* A resolved parameter is invisible to the model and cannot be supplied by a client. Values the model must not invent - prices, identities, permissions - belong here.
* A resolver's parameters are resolved the same way: the `Context`, another `Resolve(...)`, or a tool argument by name. The graph runs each resolver at most once per call.
* A resolver's parameters are resolved the same way: the `Context`, another `Resolve(...)`, or a tool argument by name. The graph runs each resolver at most once per round, however many consumers it has; each question is asked exactly once, an eliciting resolver runs again to consume its answer, and a resolver that never asked may run again when a call resumes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Earlier sections of this page still state the old per-call guarantee — line 61 ("the SDK runs the resolver at most once per call, no matter how many declare it") and line 73 ("it runs once per call. One inventory lookup, two consumers") — which now contradicts the per-round contract this PR introduces in the new !!! info box and the rewritten Recap bullet on the same page. Reword those earlier statements (and the "Don't take once-per-call on faith" / "Once per call means exactly that" tips) to the per-round / asked-once phrasing, or add a forward reference to the info box, so an author doesn't write a non-idempotent resolver relying on once-per-call.

Extended reasoning...

What the issue is. docs/tutorial/dependencies.md is now internally inconsistent about the resolver run-once guarantee. This PR added the !!! info box (lines ~119-132: "a resolver that answered without asking, like check_stock, may run again whenever the call resumes after a question") and rewrote the Recap bullet (line ~138: "at most once per round ... a resolver that never asked may run again when a call resumes"). But the earlier sections of the same page were left with the old contract: line 61 says "every tool that needs stock declares the same parameter, and the SDK runs the resolver at most once per call, no matter how many declare it", line 73 says "it runs once per call. One inventory lookup, two consumers", and the nearby tips reinforce it ("Don't take once-per-call on faith ... one line per call"; "Once per call means exactly that").

Why the old wording is no longer universally true. Under the >= 2026-07-28 input_required flow, only elicited outcomes are persisted in request_state; a resolver that resolves without eliciting is pure and re-runs on every retry round whenever the tool also has an eliciting resolver. The PR's own tests assert exactly this: test_auto_driver_answers_independent_questions_in_a_single_round counts rounds == 2 via a pure resolver that re-runs each round, and test_input_required_resolver_asks_and_consumes_then_never_reruns shows an eliciting resolver running twice (ask + consume). The page's own tutorial003 example (the eliciting confirm_backorder) is now exercised on mode="auto" by tests/docs_src/test_dependencies.py, so the multi-round behaviour applies to the very examples this page teaches.

Step-by-step proof. Take the page's tutorial003 shape extended with the tutorial002 dependency: a tool with stock: Annotated[Stock, Resolve(check_stock)] and backorder: Annotated[Backorder, Resolve(confirm_backorder)] where the title is out of stock, on a 2026-07-28 connection. Round 1: check_stock runs (lookup #1), confirm_backorder returns Elicit(...) with no answer yet, so the server returns an InputRequiredResult and only the (empty) elicited-outcome map goes into request_statecheck_stock's value is not persisted. Round 2: the client retries with the answer; check_stock runs again (lookup #2), confirm_backorder consumes its answer, the body runs. One logical tools/call, two executions of the non-eliciting resolver — directly contradicting "at most once per call" / "one inventory lookup".

Why this looks like an oversight rather than intent. The author reworded the analogous sentences elsewhere in this PR — examples/stories/refund_desk/README.md ("ask each question at most once per call"), refund_desk/client.py comments, the resolve.py module docstring, and this page's own Recap bullet — and the follow-up commit 1a15cb6 ("Scope the resolver run-once guarantee to questions, not resolver bodies") shows this exact wording class is considered worth correcting. Lines 61/73/77 and the "Once per call means exactly that" tip were simply missed. The earlier-flagged stale wording in docs/migration.md is a different file and was already addressed; this is a distinct remaining location.

Impact. Documentation-only, no runtime effect, and the statements remain literally true for the tutorial001/002 examples in their immediate context (no eliciting resolver, so the call completes in one round). But line 61 is phrased as a general design claim, and an author reading it could hang side effects or expensive non-idempotent work on a once-per-call assumption that the same page later retracts — exactly the confusion the new info box exists to prevent.

How to fix. Reword line 61 and line 73 (and the two reinforcing tips) to the per-round / asked-once phrasing the rest of the page now uses — e.g. "the SDK runs the resolver at most once per round, however many declare it" and "each question is asked once; a resolver that never asks may re-run when a call resumes" — or add a short forward reference to the !!! info box where the full contract is stated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants