fix(streamable-http): reject duplicate in-flight request ids#3063
fix(streamable-http): reject duplicate in-flight request ids#3063Sammy-Dabbas wants to merge 3 commits into
Conversation
The transport keys per-request routing by request id and assigned the slot without checking for an existing entry, so a second concurrent POST with the same id silently overwrote the first request's routing slot. One request received the other's payload and the other hung. Reject a POST whose request id is already in flight on the session with HTTP 400 and JSON-RPC -32600. Ids may still be reused once the earlier request completes, which deployed clients rely on. Fixes modelcontextprotocol#3060.
There was a problem hiding this comment.
1 issue found across 2 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
|
Good catch on the priming race, that was a real gap: the guard ran before the awaited store_event, so a same-id POST could slip in during event-store persistence. Fixed by reserving the routing slot synchronously with the guard and minting the priming event afterwards, with the reservation released if the event store raises so the 500 path leaves nothing in flight. Added a regression test that suspends a gated event store mid-priming and asserts the concurrent duplicate is still rejected; it fails by timeout on the previous commit and passes now. Full transport module is green (68 tests). |
In resumable SSE mode the priming event is minted by awaiting EventStore.store_event between the duplicate-id guard and the routing slot registration, so a concurrent POST reusing the id could pass the guard during that await and overwrite the slot. Reserve the slot synchronously with the guard and mint the priming event afterwards, releasing the reservation if the event store raises so the outer 500 path leaves nothing in flight. Adds a regression test that suspends a gated event store mid-priming and asserts the duplicate POST is still rejected.
|
Nice — the slot reservation before priming closes the resumable-SSE race cleanly. Two mechanical CI things are what's currently red, both in the newly added test code rather than the fix itself: 1. Coverage gate (the 20 Line 844 in the new test is uncovered (this repo enforces 2. pre-commit (pyright). The Looks like Once those two are green the change looks solid. For what it's worth, I'm putting up the dispatcher-layer piece we discussed (the |
Mark the test helper's defensive raise no-cover, matching the module helper's convention, and give the request dicts concrete types so the dictionary unpacks resolve under pyright. Import the protocol version header from its defining module.
|
Thanks for running those down, both fixed. The uncovered line was the test helper's defensive raise, now marked no-cover matching the module-level first_sse_data helper's convention. The pyright errors came from the untyped dict unpacks; init_request and the tool call payload now have concrete types, and I moved the protocol version header import to its defining module while there. pyright is clean on the file and the coverage gate passes locally apart from pre-existing platform-conditional lines in other test files that only execute on Linux. Glad the dispatcher piece is up as #3064, and the scoping is exactly right. The two guards compose and neither alone closes the issue, so from my side please keep it open. |
Fixes #3060.
The stateful streamable HTTP transport keys per-request routing by the request id and assigns the slot without an existence check. Two concurrent POSTs sharing an id cross-wire: the second overwrites the first's routing slot, the first request's response is delivered to the second POST, and the first hangs while its stream leaks. Reproduced on main before the fix.
This adds a guard in
_handle_post_requestthat rejects a POST whose request id is already in flight on the session with HTTP 400 and JSON-RPC -32600, placed before the JSON/SSE branch so both response modes are covered. The spec requires request ids to be unique within a session. Sequential reuse of an id after the earlier request completes still works, since deployed clients send a constant id for every request; a regression test pins that behavior.Tests: the new duplicate-id test fails on unpatched main and passes with the fix. tests/shared/test_streamable_http.py passes 67/67 and the full suite passes locally (5282 passed). A stress run of 500 rapid sequential same-id requests produced no spurious rejections. ruff and pyright are clean on the touched files.
Scope note: this is the transport-level guard only. The dispatcher-level blind overwrite in jsonrpc_dispatcher.py (TODO from #3046) is deliberately left to the follow-up discussed on the issue, since the two guards compose.
Disclosure: this change was developed with AI assistance (Claude Code). I reviewed the change and the test results before submitting.