Skip to content

Add sandbox-coding-agent example: Think orchestrating Claude Code in containers#1830

Merged
threepointone merged 1 commit into
mainfrom
example/sandbox-coding-agent
Jun 28, 2026
Merged

Add sandbox-coding-agent example: Think orchestrating Claude Code in containers#1830
threepointone merged 1 commit into
mainfrom
example/sandbox-coding-agent

Conversation

@threepointone

@threepointone threepointone commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

A new example (examples/sandbox-coding-agent) demonstrating Think as an orchestrator over containerized coding agents — the Cloudflare-native take on the "agent harness" pattern, one level up. You chat with a CodingOrchestrator (Think); it delegates each concrete coding task to a Claude Code sub-agent running in its own Cloudflare Sandbox container, and streams each sub-agent's narration, tool calls, and final git diff back into the chat.

  • Agents-as-tools / sub-agents. The orchestrator exposes agentTool(ClaudeCodeAgent, …); each delegated task runs as a facet with its own isolated container (getSandbox + a hashed, DNS-safe sandbox id). delegate_parallel fans out across containers via runAgentTool.
  • Zero-token AI Gateway. The Sandbox subclass intercepts the container's api.anthropic.com egress (outboundByHost + interceptHttps) and forwards it through env.AI.gateway(), which authenticates via your Cloudflare account. No Anthropic key or AI Gateway token lives in the container — only a plaintext GATEWAY_ID. (Requires exporting the SDK's ContainerProxy from the Worker entry.)
  • Think owns planning; Claude Code owns coding. Orchestrator loop runs on Workers AI; each sub-agent drives claude -p headless inside its container and maps stream-json to AI SDK UIMessage chunks. beforeTurn restricts the orchestrator to its delegation tools so it can't wander into Think's built-in workspace tools. CLI stderr/exit/result errors are surfaced into the delegate panel instead of silently showing "no changes".
  • Docs. README explains the no-token trick, the architecture, and a Durability & recovery section (three lifecycles; the container disk is ephemeral and resets across sleepAfter) with deferred upgrade paths (Sandbox backup/restore; harness-based mid-turn continuity, Add an @ai-sdk/sandbox-cloudflare HarnessAgent provider (during ai v7 migration) #1829).

Pins @cloudflare/sandbox to 0.12.1 — the 0.12.2 image failed to publish to Docker Hub.

Examples don't require a changeset.

Test plan

  • pnpm run check (sherif + export checks + oxfmt + oxlint + typecheck — all 114 projects typecheck)
  • Ran locally end-to-end: orchestrator delegates a single task and a parallel fan-out; Claude Code runs in-container, egress routes through AI Gateway with no token, diffs render in the delegate panels.
  • Reviewer: requires Docker locally (paid Workers plan to deploy) and an AI Gateway that can reach Anthropic without a per-request key (Unified Billing or BYOK).

Made with Cursor


Open in Devin Review

… Code in containers

A Think agent that orchestrates Claude Code coding agents, each running in
its own Cloudflare Sandbox container, with live progress and diffs streamed
into the chat.

Highlights:
- Agents-as-tools: the orchestrator delegates each task to a ClaudeCodeAgent
  facet (one isolated container per run via getSandbox + a derived sandbox id);
  delegate_parallel fans out across containers.
- Zero-token AI Gateway: the Sandbox subclass intercepts the container's
  api.anthropic.com egress (outboundByHost + interceptHttps) and routes it
  through env.AI.gateway(), so no Anthropic key or AI Gateway token lives in
  the container — only a plaintext GATEWAY_ID. Requires exporting the SDK's
  ContainerProxy from the Worker entry.
- Claude Code runs headless as root with IS_SANDBOX=1; its stream-json is
  mapped to AI SDK UIMessage chunks and stderr/exit/result errors are surfaced.
- beforeTurn restricts the orchestrator to its delegation tools so it can't
  wander into Think's built-in workspace tools.

Pins @cloudflare/sandbox to 0.12.1 (the 0.12.2 image failed to publish to
Docker Hub, cloudflare/sandbox-sdk#792).

README documents the durability/recovery model (three lifecycles, the
ephemeral container disk) and the deferred backup/restore + harness-migration
upgrade paths (#1829).

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot

changeset-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: e7de6d0

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@threepointone threepointone merged commit 2351e5c into main Jun 28, 2026
4 checks passed
@threepointone threepointone deleted the example/sandbox-coding-agent branch June 28, 2026 20:17
threepointone added a commit that referenced this pull request Jun 28, 2026
* docs(design): add rfc-coding-agent (first-class CodingAgent for Think)

Proposes promoting examples/sandbox-coding-agent (#1830) into @cloudflare/think
as a supported `CodingAgent` class — a Think subclass that drives a CLI coding
agent (Claude Code first) inside a Cloudflare Sandbox, exported per-CLI as
`@cloudflare/think/claudecode`.

Locks the surface before any core code moves: an internal TurnRuntime seam in
Think (private, CodingAgent is its only consumer), a per-CLI adapter contract,
tokenless AI Gateway egress + snapshot-based durability built in, DO-tuned
recovery, and a conformance-test strategy for stream-json drift. Strategic
stance: own the public interface, keep the engine swappable (a matured
@ai-sdk/harness could become an impl detail behind the same class — #1829).

Status: proposed.
Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(design): expand rfc-coding-agent with config, threads, and seams

Folds in the firm follow-ups from review:
- §8 dynamic config (resolve-by-precedence, freeze-on-first-turn) + topology
  (standalone / Chats-child threads / orchestrated), with the "no top-level
  binding assumption" requirement.
- §9 two seams designed in from day one: a filesystem backend interface (so the
  durable cloudflare/workspace VFS can supersede snapshots later) and run/preview
  by target (container dev server vs worker-bundler + env.LOADER).
- §10 future-work pointer to a Workers-native runtime (Runtime B) behind the same
  TurnRuntime seam.
- New alternative (adopt cloudflare/workspace now — rejected, preview-only) and
  expanded decision questions.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(design): drop first-class Chats; threads are a userland directory pattern

The fixed chats_index/ChatSummary schema is the part consumers outgrow
immediately — a coding directory needs repo/branch/status/lastDiff, etc. So:

- rfc-coding-agent §8: reframe "threads" as a userland directory pattern (plain
  Agent + subAgent with domain-specific metadata), not a `Chats` base class. The
  shipped, load-bearing primitives it leans on are unchanged (subAgent + Props,
  parentAgent, RemoteContextProvider).
- rfc-think-multi-session: record the third (now-leaning) answer to open question
  #1 — don't ship a Chats base class; ship primitives + a thin client hook + an
  example.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(design): rewrite rfc-coding-agent — own package, AIChatAgent base, pluggable engine

Adversarial review reshaped the design:
- New @cloudflare/coding-agent package, NOT a Think subclass/subpath. Extends
  AIChatAgent, so onChatMessage is the seam and Think core is untouched (drops
  the riskiest piece — the turn-runtime refactor). Honors the AGENTS.md layering
  preference (containers don't belong in the chat base).
- Pluggable engine: CliEngine ships first (lift the example's mapper), HarnessEngine
  is the goal (reuses HarnessAgent's tested stream-mapping + session lifecycle;
  gated on #1829). No speculative multi-CLI adapter interface — extract after codex.
- Durability redesigned around two decoupled lifecycles (DO vs container have
  different shutdown behaviors); reconcile-on-wake; honest that claude -p can't
  resume a killed turn and that re-run can double-apply edits; bound snapshot cost.
- Egress scoped honestly (per-provider, TLS-dependent, OAuth CLIs can't be tokenless).
- Branch is mutable working state (git checkout), only repo identity is frozen.
- Filesystem VFS, preview, git ops, HITL, and Runtime B moved to "Directions"
  (not v1 seams). Added a Testing & CI section.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(design): mark package name resolved in rfc-coding-agent decision

Package name @cloudflare/coding-agent + /claude-code subpath confirmed. Engine
default, snapshot policy, and first-PR scope remain deliberately open.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Sunil Pai <18808+threepointone@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant