fix(chat): serialize reconnect-driven resumes to stop activeResponse race (#1837)#1838
Merged
Conversation
…race (#1837) With `resume: true` (the default), `useAgentChat` re-probes the stream from its WebSocket `onAgentOpen` handler on every reconnect. The AI SDK's `Chat.makeRequest` has no concurrency guard: every resume shares the single mutable `this.activeResponse`, and its `finally` finalizer reads `this.activeResponse.state.message` with a bare (unguarded) read before clearing it (the adjacent `finishReason` field is optional-chained, the `message` field is not). Under a reconnect storm (flaky mobile link, or a Durable Object bounce on redeploy), a later resume could overwrite + clear `activeResponse` before an earlier resume's finalizer ran, so the earlier finalizer read `undefined` and threw a handled `TypeError: Cannot read properties of undefined (reading 'state')`. The previous guard did not close the window: - `customTransport.isAwaitingResume()` only covers the handshake — it flips false the instant `STREAM_RESUMING` resolves, but the AI SDK only sets status to "submitted" in a later microtask (it sits behind `await transport.reconnectToStream(...)`), and - `statusRef.current` is lagging React state that has not re-rendered yet. So a second `open` landing in that post-handshake / pre-status-propagation window sailed past both guards and launched an overlapping resume. Fix: serialize re-probe resumes with `resumeInFlightRef` — never issue a new `resumeStream()` while one is still outstanding. The flag is held for the whole resume lifetime and force-cleared in the socket-effect cleanup so an orphaned resume (agent swapped on a `_pk` change) can't leave the gate stuck closed. A `resumeGenerationRef` token prevents a stale, orphaned resume's late `.finally` from reopening the gate underneath a newer resume. The definitive activeResponse-local fix belongs upstream in Vercel `ai`; this stops the SDK from triggering the overlap. Adds a deterministic regression test (`resume-overlap-race.test.tsx`) that drives the real hook through a reconnect storm via a fake EventTarget agent and asserts the overlapping reconnect issues no second resume and the finalizer never reads a cleared activeResponse. Co-authored-by: Cursor <cursoragent@cursor.com>
🦋 Changeset detectedLatest commit: f99f889 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
agents
@cloudflare/ai-chat
@cloudflare/codemode
create-think
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
…#1837) A review suggested clearing `resumeInFlightRef` in `onAgentClose` for clarity. That is unsafe: the flag is owned by the in-flight resume and cleared only by its own `.finally` (or invalidated via the cleanup generation bump). Resetting it on close would set it false while the resume may still be mid-flight, re-coupling correctness to close/open task ordering and reopening the overlap window. It is also unnecessary — handshake-phase drops are gated by `isAwaitingResume()` and streaming-phase drops settle `makeRequest` (which clears the flag). Document the invariant inline so it isn't "hardened" away. Co-authored-by: Cursor <cursoragent@cursor.com>
This was referenced Jun 29, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1837 —
useAgentChat({ resume: true })throwing a handledTypeError: Cannot read properties of undefined (reading 'state')from the AI SDK'sChat.makeRequestfinalizer during a reconnect storm.Root cause. The AI SDK's
Chat.makeRequesthas no concurrency guard — every resume shares the single mutablethis.activeResponse, and itsfinallyfinalizer readsthis.activeResponse.state.messagewith a bare (unguarded) read before clearing it (the adjacentfinishReasonfield is optional-chained;messageis not). The hook re-probes the stream from its WebSocketonAgentOpenhandler on every reconnect, and the existing guard didn't close the overlap window:customTransport.isAwaitingResume()only covers the handshake — it flips false the instantSTREAM_RESUMINGresolves, but the AI SDK only sets status to"submitted"in a later microtask (behindawait transport.reconnectToStream(...)), andstatusRef.currentis lagging React state that hasn't re-rendered yet.So a second socket
openlanding in that post-handshake / pre-status-propagation window (flaky mobile link, or a Durable Object bounce on redeploy) sailed past both guards and launched an overlapping resume. The later resume clearedactiveResponsebefore the earlier resume's finalizer ran → the earlier finalizer readundefined→ throw.Changes
resumeInFlightRef: never issue a newresumeStream()while one is still outstanding. The flag is held for the whole resume lifetime and force-cleared in the socket-effect cleanup so an orphaned resume (agent swapped on a_pkchange) can't leave the gate stuck closed.resumeGenerationReftoken prevents a stale, orphaned resume's late.finallyfrom reopening the gate underneath a newer resume on the next socket (mirrors the existing tool-continuation generation pattern in this file).packages/agents/src/react-tests/resume-overlap-race.test.tsx) drives the real hook through a reconnect storm via a fakeEventTargetagent and asserts the overlapping reconnect issues no second resume and the finalizer never reads a clearedactiveResponse. Verified it fails onmain(3 vs 2 resume requests + the exactstateTypeError) and passes with the fix.The definitive
activeResponse-local fix belongs upstream in Vercelai; this stops the SDK from triggering the overlap. The fix lives in the sharedagents/chatcore hook, so it covers both@cloudflare/thinkand@cloudflare/ai-chat.Residual (out of scope, same upstream root cause)
The narrower mount-resume-vs-reprobe and submit-vs-reprobe windows share the same unguarded-
activeResponsedefect; React event batching makes the submit case effectively closed, and both ultimately want the upstream finalizer fix.Test plan
pnpm run check(sherif + export checks + oxfmt + oxlint + typecheck across all 114 projects)vitest --project react(agent-tool-replay) — unchangedvitest --project chat(514 tests, incl. resume-handshake) — unchanged@cloudflare/thinkreact-tests (stream-resume, studio-chat) — unchangedMade with Cursor