Skip to content

feat(v1/runtimes): Modal sandbox snapshot save + per-task resume#1832

Open
samsja wants to merge 1 commit into
mainfrom
feat/modal-sandbox-snapshot
Open

feat(v1/runtimes): Modal sandbox snapshot save + per-task resume#1832
samsja wants to merge 1 commit into
mainfrom
feat/modal-sandbox-snapshot

Conversation

@samsja

@samsja samsja commented Jun 23, 2026

Copy link
Copy Markdown
Member

What

Adds the ability to save a sandbox's state after a rollout and restart a later rollout from that exact past state, on the Modal runtime.

Opt-in via the Modal runtime config (enable_snapshot=True). Per task.idx:

  1. A rollout runs, and on teardown snapshots its sandbox end-state (memory + filesystem) before terminating, storing the snapshot ref.
  2. The next rollout of that task injects the ref as resume_from and restores an exact clone instead of cold-booting.

Changes

  • runtimes/base.py — extends the Runtime contract additively: a supports_snapshot capability flag and an optional async snapshot() -> str (default raises NotImplementedError, so a misconfigured resume fails loudly instead of silently discarding state). Runtime-agnostic.
  • runtimes/modal.pyModalConfig gains enable_snapshot and resume_from; start() branches to restore a memory+fs clone (SandboxSnapshot.from_id_experimental_from_snapshot) when resuming, else provisions fresh with snapshotting opt-in; snapshot() wraps _experimental_snapshot().
  • rollout.py — optional snapshot_sink callback; snapshots the runtime end-state in the teardown finally (best-effort — a snapshot failure never fails the rollout or blocks teardown) and reports (task.idx, ref).
  • env.py — in-process per-task snapshot store; injects resume_from and wires the sink in episode().

No behavior change for non-snapshot runtimes — everything is gated on enable_snapshot and defaults off.

Verified

  • ruff check + ruff format (pre-commit) pass on all four files.
  • ✅ Compiles; non-snapshot runtimes (subprocess/docker/prime) untouched (guarded by getattr(config, "enable_snapshot", False)).

Not yet verified — needs a live Modal smoke test

  • The experimental Modal APIs (SandboxSnapshot.from_id, _experimental_from_snapshot, _experimental_snapshot) — written to the documented signatures but not executed (no Modal creds in the dev env; the local venv also has an unrelated renderers import mismatch that blocks the test suite here).
  • Whether a restored sandbox re-acquires its port tunnel — Modal forwards encrypted_ports at create-time; if restore doesn't re-establish it, the harness can't reach a server inside the box. Highest-risk unknown.

Known limitations (by design, this pass)

  • In-process store only: the per-task store lives on the Environment instance, so resume works within one process but does not survive a restart or reach distributed prime-rl workers. Persistent store is the follow-up.
  • n>1 concurrency: sibling rollouts of one task all start before any snapshot completes, so last-to-finish wins the stored ref. Clean for n=1.
  • Modal memory snapshots expire after 7 days and pin the sandbox to one instance type (upstream Modal constraints).

🤖 Generated with Claude Code


Note

Medium Risk
Uses experimental Modal snapshot APIs and unverified restore/tunnel behavior; snapshot failures are swallowed so resume may silently fall back to cold start.

Overview
Adds opt-in Modal sandbox snapshot/resume so a later rollout of the same task can cold-start from the previous rollout’s end state instead of a fresh image.

The Runtime contract gains supports_snapshot and async snapshot() -> str (default raises). Modal adds enable_snapshot / resume_from on config, restores via experimental snapshot APIs on start(), and captures state in snapshot(). Rollout takes an optional snapshot_sink and best-effort snapshots before stop() in teardown. Environment keeps an in-process task.idx → ref map, injects resume_from when building episodes, and wires the sink—gated on enable_snapshot so other runtimes are unchanged.

Limitations: store is in-process only; concurrent n>1 rollouts race on the stored ref; restored sandboxes’ port tunnels are not yet verified live.

Reviewed by Cursor Bugbot for commit 86ab572. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add Modal sandbox snapshot save and per-task resume to the v1 runtime

  • Adds enable_snapshot and resume_from fields to ModalConfig and implements snapshot() on ModalRuntime, which captures live sandbox state via Modal's experimental API and returns a snapshot object ID.
  • At the end of each rollout, if the runtime supports snapshots, Rollout.run calls snapshot() and passes the result to an optional snapshot_sink(task_idx, ref) callback; failures are logged as warnings and do not fail the rollout.
  • Environment stores per-task snapshot refs in _snapshot_refs and, when enable_snapshot is set, injects resume_from into the runtime_config for subsequent rollouts on the same task so the sandbox is restored from the snapshot instead of provisioned fresh.
  • Behavioral Change: sandboxes created with enable_snapshot=True call Modal's _experimental_enable_snapshot API; make_directory(workdir) is skipped when resuming from a snapshot.

Macroscope summarized 86ab572.

Add an optional snapshot capability to the Runtime contract and implement it on
the Modal runtime, so a rollout can save its sandbox's end state and a later
rollout of the same task can restart from that exact past state.

- base.py: `supports_snapshot` flag + optional `snapshot()` method (default
  raises, so a misconfigured resume fails loudly rather than losing state).
- modal.py: `enable_snapshot`/`resume_from` config; resume-aware `start()` that
  restores a memory+filesystem clone via Modal's experimental snapshot API;
  `snapshot()` returns the snapshot id.
- rollout.py: optional `snapshot_sink` — snapshots the runtime end state on
  teardown (best-effort; never fails the rollout) and reports `(task.idx, ref)`.
- env.py: in-process per-task snapshot store; injects `resume_from` and wires
  the sink in `episode()`. Opt-in via the runtime's `enable_snapshot`.

In-process store only (does not survive restart or reach other workers).
Experimental Modal APIs and tunnel-on-restore behavior still need a live smoke
test; non-snapshot runtimes are unaffected (all defaults off).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 86ab572. Configure here.

)
self._sandbox = await modal.Sandbox._experimental_from_snapshot.aio(
snapshot
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resume omits sandbox port setup

High Severity

Fresh Modal sandboxes declare encrypted_ports for SERVICE_PORT so expose can resolve public tunnels, but the resume_from path restores only via _experimental_from_snapshot without that setup. Resumed rollouts may get no tunnel for the service port, breaking reachability for in-sandbox MCP tools and yielding invalid URLs downstream.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 86ab572. Configure here.

)
self._sandbox = await modal.Sandbox._experimental_from_snapshot.aio(
snapshot
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resume skips snapshot enable flag

Medium Severity

When enable_snapshot is on, new sandboxes pass _experimental_enable_snapshot at create time, but sandboxes restored with resume_from never receive that flag. A resumed rollout may fail best-effort snapshot() at teardown, leaving the per-task store on an older ref and breaking multi-step resume chains.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 86ab572. Configure here.

resume_from: str | None = None
"""A snapshot id from a prior `snapshot()` to restore instead of provisioning fresh: the
new sandbox is an exact clone of the snapshotted one (process state, packages, workdir).
None provisions a clean sandbox from `image`."""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot feature lacks docs update

Low Severity

This PR adds user-facing Modal snapshot/resume configuration (enable_snapshot, resume_from) and Environment per-task resume wiring, but no updates appear in the documented reference surfaces (docs/overview.md, docs/environments.md, docs/reference.md, or docs/faqs.md for limitations).

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Reviewed by Cursor Bugbot for commit 86ab572. Configure here.

@macroscopeapp

macroscopeapp Bot commented Jun 23, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

New feature adding Modal sandbox snapshot/resume capabilities with unresolved HIGH severity review comment about resumed sandboxes potentially missing port configuration, which could break service connectivity.

You can customize Macroscope's approvability policy. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant