feat(v1/runtimes): Modal sandbox snapshot save + per-task resume#1832
feat(v1/runtimes): Modal sandbox snapshot save + per-task resume#1832samsja wants to merge 1 commit into
Conversation
Add an optional snapshot capability to the Runtime contract and implement it on the Modal runtime, so a rollout can save its sandbox's end state and a later rollout of the same task can restart from that exact past state. - base.py: `supports_snapshot` flag + optional `snapshot()` method (default raises, so a misconfigured resume fails loudly rather than losing state). - modal.py: `enable_snapshot`/`resume_from` config; resume-aware `start()` that restores a memory+filesystem clone via Modal's experimental snapshot API; `snapshot()` returns the snapshot id. - rollout.py: optional `snapshot_sink` — snapshots the runtime end state on teardown (best-effort; never fails the rollout) and reports `(task.idx, ref)`. - env.py: in-process per-task snapshot store; injects `resume_from` and wires the sink in `episode()`. Opt-in via the runtime's `enable_snapshot`. In-process store only (does not survive restart or reach other workers). Experimental Modal APIs and tunnel-on-restore behavior still need a live smoke test; non-snapshot runtimes are unaffected (all defaults off). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 86ab572. Configure here.
| ) | ||
| self._sandbox = await modal.Sandbox._experimental_from_snapshot.aio( | ||
| snapshot | ||
| ) |
There was a problem hiding this comment.
Resume omits sandbox port setup
High Severity
Fresh Modal sandboxes declare encrypted_ports for SERVICE_PORT so expose can resolve public tunnels, but the resume_from path restores only via _experimental_from_snapshot without that setup. Resumed rollouts may get no tunnel for the service port, breaking reachability for in-sandbox MCP tools and yielding invalid URLs downstream.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 86ab572. Configure here.
| ) | ||
| self._sandbox = await modal.Sandbox._experimental_from_snapshot.aio( | ||
| snapshot | ||
| ) |
There was a problem hiding this comment.
Resume skips snapshot enable flag
Medium Severity
When enable_snapshot is on, new sandboxes pass _experimental_enable_snapshot at create time, but sandboxes restored with resume_from never receive that flag. A resumed rollout may fail best-effort snapshot() at teardown, leaving the per-task store on an older ref and breaking multi-step resume chains.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 86ab572. Configure here.
| resume_from: str | None = None | ||
| """A snapshot id from a prior `snapshot()` to restore instead of provisioning fresh: the | ||
| new sandbox is an exact clone of the snapshotted one (process state, packages, workdir). | ||
| None provisions a clean sandbox from `image`.""" |
There was a problem hiding this comment.
Snapshot feature lacks docs update
Low Severity
This PR adds user-facing Modal snapshot/resume configuration (enable_snapshot, resume_from) and Environment per-task resume wiring, but no updates appear in the documented reference surfaces (docs/overview.md, docs/environments.md, docs/reference.md, or docs/faqs.md for limitations).
Triggered by project rule: BugBot Instructions
Reviewed by Cursor Bugbot for commit 86ab572. Configure here.
ApprovabilityVerdict: Needs human review New feature adding Modal sandbox snapshot/resume capabilities with unresolved HIGH severity review comment about resumed sandboxes potentially missing port configuration, which could break service connectivity. You can customize Macroscope's approvability policy. Learn more. |


What
Adds the ability to save a sandbox's state after a rollout and restart a later rollout from that exact past state, on the Modal runtime.
Opt-in via the Modal runtime config (
enable_snapshot=True). Pertask.idx:resume_fromand restores an exact clone instead of cold-booting.Changes
runtimes/base.py— extends theRuntimecontract additively: asupports_snapshotcapability flag and an optionalasync snapshot() -> str(default raisesNotImplementedError, so a misconfigured resume fails loudly instead of silently discarding state). Runtime-agnostic.runtimes/modal.py—ModalConfiggainsenable_snapshotandresume_from;start()branches to restore a memory+fs clone (SandboxSnapshot.from_id→_experimental_from_snapshot) when resuming, else provisions fresh with snapshotting opt-in;snapshot()wraps_experimental_snapshot().rollout.py— optionalsnapshot_sinkcallback; snapshots the runtime end-state in the teardownfinally(best-effort — a snapshot failure never fails the rollout or blocks teardown) and reports(task.idx, ref).env.py— in-process per-task snapshot store; injectsresume_fromand wires the sink inepisode().No behavior change for non-snapshot runtimes — everything is gated on
enable_snapshotand defaults off.Verified
ruff check+ruff format(pre-commit) pass on all four files.getattr(config, "enable_snapshot", False)).Not yet verified — needs a live Modal smoke test
SandboxSnapshot.from_id,_experimental_from_snapshot,_experimental_snapshot) — written to the documented signatures but not executed (no Modal creds in the dev env; the local venv also has an unrelatedrenderersimport mismatch that blocks the test suite here).encrypted_portsat create-time; if restore doesn't re-establish it, the harness can't reach a server inside the box. Highest-risk unknown.Known limitations (by design, this pass)
Environmentinstance, so resume works within one process but does not survive a restart or reach distributed prime-rl workers. Persistent store is the follow-up.n>1concurrency: sibling rollouts of one task all start before any snapshot completes, so last-to-finish wins the stored ref. Clean forn=1.🤖 Generated with Claude Code
Note
Medium Risk
Uses experimental Modal snapshot APIs and unverified restore/tunnel behavior; snapshot failures are swallowed so resume may silently fall back to cold start.
Overview
Adds opt-in Modal sandbox snapshot/resume so a later rollout of the same task can cold-start from the previous rollout’s end state instead of a fresh image.
The
Runtimecontract gainssupports_snapshotandasync snapshot() -> str(default raises). Modal addsenable_snapshot/resume_fromon config, restores via experimental snapshot APIs onstart(), and captures state insnapshot().Rollouttakes an optionalsnapshot_sinkand best-effort snapshots beforestop()in teardown.Environmentkeeps an in-processtask.idx → refmap, injectsresume_fromwhen building episodes, and wires the sink—gated onenable_snapshotso other runtimes are unchanged.Limitations: store is in-process only; concurrent
n>1rollouts race on the stored ref; restored sandboxes’ port tunnels are not yet verified live.Reviewed by Cursor Bugbot for commit 86ab572. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add Modal sandbox snapshot save and per-task resume to the v1 runtime
enable_snapshotandresume_fromfields toModalConfigand implementssnapshot()onModalRuntime, which captures live sandbox state via Modal's experimental API and returns a snapshot object ID.Rollout.runcallssnapshot()and passes the result to an optionalsnapshot_sink(task_idx, ref)callback; failures are logged as warnings and do not fail the rollout.Environmentstores per-task snapshot refs in_snapshot_refsand, whenenable_snapshotis set, injectsresume_frominto theruntime_configfor subsequent rollouts on the same task so the sandbox is restored from the snapshot instead of provisioned fresh.enable_snapshot=Truecall Modal's_experimental_enable_snapshotAPI;make_directory(workdir)is skipped when resuming from a snapshot.Macroscope summarized 86ab572.