Skip to content

fix(chat): serialize reconnect-driven resumes to stop activeResponse race (#1837)#1838

Merged
threepointone merged 2 commits into
mainfrom
fix/resume-overlap-race
Jun 29, 2026
Merged

fix(chat): serialize reconnect-driven resumes to stop activeResponse race (#1837)#1838
threepointone merged 2 commits into
mainfrom
fix/resume-overlap-race

Conversation

@threepointone

@threepointone threepointone commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #1837useAgentChat({ resume: true }) throwing a handled TypeError: Cannot read properties of undefined (reading 'state') from the AI SDK's Chat.makeRequest finalizer during a reconnect storm.

Root cause. The AI SDK's Chat.makeRequest has no concurrency guard — every resume shares the single mutable this.activeResponse, and its finally finalizer reads this.activeResponse.state.message with a bare (unguarded) read before clearing it (the adjacent finishReason field is optional-chained; message is not). The hook re-probes the stream from its WebSocket onAgentOpen handler on every reconnect, and the existing guard didn't close the overlap window:

  • customTransport.isAwaitingResume() only covers the handshake — it flips false the instant STREAM_RESUMING resolves, but the AI SDK only sets status to "submitted" in a later microtask (behind await transport.reconnectToStream(...)), and
  • statusRef.current is lagging React state that hasn't re-rendered yet.

So a second socket open landing in that post-handshake / pre-status-propagation window (flaky mobile link, or a Durable Object bounce on redeploy) sailed past both guards and launched an overlapping resume. The later resume cleared activeResponse before the earlier resume's finalizer ran → the earlier finalizer read undefined → throw.

Changes

  • Serialize re-probe resumes via resumeInFlightRef: never issue a new resumeStream() while one is still outstanding. The flag is held for the whole resume lifetime and force-cleared in the socket-effect cleanup so an orphaned resume (agent swapped on a _pk change) can't leave the gate stuck closed.
  • resumeGenerationRef token prevents a stale, orphaned resume's late .finally from reopening the gate underneath a newer resume on the next socket (mirrors the existing tool-continuation generation pattern in this file).
  • Deterministic regression test (packages/agents/src/react-tests/resume-overlap-race.test.tsx) drives the real hook through a reconnect storm via a fake EventTarget agent and asserts the overlapping reconnect issues no second resume and the finalizer never reads a cleared activeResponse. Verified it fails on main (3 vs 2 resume requests + the exact state TypeError) and passes with the fix.

The definitive activeResponse-local fix belongs upstream in Vercel ai; this stops the SDK from triggering the overlap. The fix lives in the shared agents/chat core hook, so it covers both @cloudflare/think and @cloudflare/ai-chat.

Residual (out of scope, same upstream root cause)

The narrower mount-resume-vs-reprobe and submit-vs-reprobe windows share the same unguarded-activeResponse defect; React event batching makes the submit case effectively closed, and both ultimately want the upstream finalizer fix.

Test plan

  • pnpm run check (sherif + export checks + oxfmt + oxlint + typecheck across all 114 projects)
  • New regression test passes with the fix; fails without it
  • vitest --project react (agent-tool-replay) — unchanged
  • vitest --project chat (514 tests, incl. resume-handshake) — unchanged
  • @cloudflare/think react-tests (stream-resume, studio-chat) — unchanged

Made with Cursor


Open in Devin Review

…race (#1837)

With `resume: true` (the default), `useAgentChat` re-probes the stream from
its WebSocket `onAgentOpen` handler on every reconnect. The AI SDK's
`Chat.makeRequest` has no concurrency guard: every resume shares the single
mutable `this.activeResponse`, and its `finally` finalizer reads
`this.activeResponse.state.message` with a bare (unguarded) read before
clearing it (the adjacent `finishReason` field is optional-chained, the
`message` field is not). Under a reconnect storm (flaky mobile link, or a
Durable Object bounce on redeploy), a later resume could overwrite + clear
`activeResponse` before an earlier resume's finalizer ran, so the earlier
finalizer read `undefined` and threw a handled
`TypeError: Cannot read properties of undefined (reading 'state')`.

The previous guard did not close the window:
- `customTransport.isAwaitingResume()` only covers the handshake — it flips
  false the instant `STREAM_RESUMING` resolves, but the AI SDK only sets
  status to "submitted" in a later microtask (it sits behind
  `await transport.reconnectToStream(...)`), and
- `statusRef.current` is lagging React state that has not re-rendered yet.

So a second `open` landing in that post-handshake / pre-status-propagation
window sailed past both guards and launched an overlapping resume.

Fix: serialize re-probe resumes with `resumeInFlightRef` — never issue a new
`resumeStream()` while one is still outstanding. The flag is held for the
whole resume lifetime and force-cleared in the socket-effect cleanup so an
orphaned resume (agent swapped on a `_pk` change) can't leave the gate stuck
closed. A `resumeGenerationRef` token prevents a stale, orphaned resume's
late `.finally` from reopening the gate underneath a newer resume.

The definitive activeResponse-local fix belongs upstream in Vercel `ai`; this
stops the SDK from triggering the overlap.

Adds a deterministic regression test (`resume-overlap-race.test.tsx`) that
drives the real hook through a reconnect storm via a fake EventTarget agent
and asserts the overlapping reconnect issues no second resume and the
finalizer never reads a cleared activeResponse.

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot

changeset-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: f99f889

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment thread packages/agents/src/chat/react.tsx
@pkg-pr-new

pkg-pr-new Bot commented Jun 29, 2026

Copy link
Copy Markdown

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1838

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1838

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1838

create-think

npm i https://pkg.pr.new/create-think@1838

hono-agents

npm i https://pkg.pr.new/hono-agents@1838

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1838

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1838

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1838

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1838

commit: f99f889

…#1837)

A review suggested clearing `resumeInFlightRef` in `onAgentClose` for clarity.
That is unsafe: the flag is owned by the in-flight resume and cleared only by
its own `.finally` (or invalidated via the cleanup generation bump). Resetting
it on close would set it false while the resume may still be mid-flight,
re-coupling correctness to close/open task ordering and reopening the overlap
window. It is also unnecessary — handshake-phase drops are gated by
`isAwaitingResume()` and streaming-phase drops settle `makeRequest` (which
clears the flag). Document the invariant inline so it isn't "hardened" away.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

useAgentChat reconnect-driven resume races AI SDK Chat.makeRequest finalizer → "Cannot read properties of undefined (reading 'state')"

1 participant