feat: configurable tunneling in interception#1830
Conversation
Make *how the host interception server is reached from a remote harness runtime* a first-class, pluggable choice instead of a hardcoded prime_tunnel call. `EnvConfig.interception` is a discriminated union (`--interception.type prime|modal|url`) over three backends, with `multiplex` as the shared field (moved off the root EnvConfig): - prime (default): expose the host port via prime_tunnel (frpc) — unchanged behavior, works from any host with prime credentials. - modal: expose it via Modal's own forwarding (modal.forward) — for a Modal-hosted trainer/eval (modal.forward only works inside a container). - url: bring your own reverse proxy — open no tunnel, reach the server at a public `url` you front the fixed local `port` with. One URL == one server. `expose_interception(config, port, *, is_local)` is the single place a type maps to a reachable URL (localhost when the runtime is local, whatever the type). InterceptionPool now takes the config; InterceptionServer can bind a fixed port for the BYO target. BREAKING: EnvConfig.multiplex -> EnvConfig.interception.multiplex. The default (PrimeInterceptionConfig(multiplex=32)) keeps existing runs unchanged; only callers that set multiplex migrate (`--multiplex N` -> `--interception.multiplex N`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the host-reachability logic out of free functions (expose_interception / _modal_host_endpoint / bind_host) into a lightweight `Tunnel` class hierarchy, parallel to the Runtime hierarchy: `Tunnel` (the contract) + `PrimeTunnel`, `ModalTunnel`, `CustomTunnel`. Each owns its `bind_host`, `bind_port`, `single_server`, and `expose()`; the base `reachable()` handles the local short-circuit. Each `InterceptionConfig` owns its tunnel via `config.tunnel()`, so the pool is branch-free — it builds the tunnel once and drives the server through `bind_port`/`bind_host`/`reachable()`/`single_server`. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the single tunnel module into a package mirroring runtimes/: base.py
(the Tunnel contract), prime.py / modal.py / custom.py (the implementations),
and __init__.py with the make_tunnel(config) factory + _tunnel_cls dispatch.
Tunnels are built from a config via make_tunnel (no method on the config), the
host-side counterpart to make_runtime. Also tighten the EnvConfig.interception
docstring ("from a remote runtime"). No behavior change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Match the tunnel class name: `UrlInterceptionConfig` -> `CustomInterceptionConfig`, discriminator `type="url"` -> `"custom"` (so `--interception.type prime|modal|custom`), served by `CustomTunnel`. The `url` field is unchanged (it is a URL). Also trim the config docstrings to one-liners — the per-type caveats (prime credentials, modal-container-only) live on the Tunnel classes now — and drop the `multiplex` paragraph from the config module docstring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…untime import - Derive `Tunnel.single_server` from `bind_port` (a fixed bind port can host only one server, so it's a single endpoint) instead of a separate ClassVar + `CustomTunnel` override. `CustomTunnel` now only overrides `bind_port`. - `ModalTunnel.expose` uses `modal.forward` directly (it is already an async context manager) instead of borrowing `runtimes.base.open_tunnel` for retry/wrapping, so `modal.py` imports nothing from `runtimes`. `prime.py` still reuses `host_endpoint` - the rate-limited prime_tunnel primitive shared with the tool-serving `reachable_url` path (the correct higher->lower dependency; duplicating it would also pull in the shared tunnel limiter). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tunnel - Relocate `host_endpoint` + its `open_tunnel` retry helper out of `runtimes/base.py` into `interception/tunnel/prime.py` (the prime_tunnel primitive, alongside `PrimeTunnel`). `PrimeTunnel.expose` uses it directly; the tool-serving `reachable_url` (still in runtimes) imports it lazily to avoid an import cycle. The host-wide `TUNNEL_LIMITER` stays in `runtimes.limiters` (a shared rate-limit resource, used by sandbox creation too) and is imported lazily inside `host_endpoint`. So the tunnel package no longer imports a runtime *method*. - `InterceptionServer(tunnel)` binds where its tunnel says (`bind_host` / `bind_port`), instead of the pool threading those through as separate kwargs; no tunnel = loopback ephemeral (the bare-server default). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or at call sites - `Tunnel` is generic over its config (`Tunnel[ConfigT]`), so `CustomTunnel`'s `self.config` is typed without a quoted forward-ref. `single_server` is a ClassVar again (the pool reads it off the tunnel class). - One dispatch, `tunnel_cls(config) -> type[Tunnel]` (dropped `make_tunnel`). - `InterceptionPool.tunnel_cls` (public) holds the class and builds one tunnel instance per server. `InterceptionServer(tunnel)` holds it as `server.tunnel`, binds where the tunnel says, and exposes itself via `server.reachable(is_local)`. - `TunnelError` is no longer raised inside the tunnel impls — `PrimeTunnel.expose` keeps only the inlined prime_tunnel mechanism + retry and lets a terminal failure propagate raw. The call sites classify it: `server.reachable` and `reachable_url` wrap tunnel *setup* (scoped via an exit stack, so a rollout-body error during `yield` isn't miscategorized). - `reachable_url` (runtimes) reaches a host service via `PrimeTunnel().reachable`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The prime_tunnel host bridge moved from runtimes.base.host_endpoint into PrimeTunnel (interception.tunnel); update the doc/comment references that still named the removed symbol. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lout path - Remove the `modal` interception type (ModalInterceptionConfig / ModalTunnel / tunnel/modal.py). modal.forward only works from inside a Modal container, so it was awkward to ship; the modal *runtime* (harness sandboxes) is unaffected. Interception types are now `prime` (default) and `custom`. - Drop the pool-less fallback in `Rollout._serve_interception` (it hardcoded a PrimeTunnel, ignoring the configured type). `episode()` only ever runs inside `Environment.serving()`, so the interception pool is always present — inline `self.interception.acquire(session)` and remove the dead branch + its imports. - Move `TUNNEL_LIMITER` (+ public `TUNNELS_PER_MIN`) into the tunnel subpackage (`interception/tunnel/prime.py`), built from `runtimes.limiters.creation_limiter`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move trailing comments to their own line / use a local for the repeated bind_host so ruff format leaves the interception server + pool untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…serving() run() calls self.interception.acquire() unconditionally, so the pool must always be present. It is — episode() only runs inside Environment.serving() — so make the contract explicit: Rollout.interception is now required (InterceptionPool, not Optional) and episode() raises a clear error if serving() isn't active, instead of a later AttributeError. (Does not restore the old pool-less fallback, which hardcoded PrimeTunnel and ignored interception.type.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…face)
For a host the harness can reach directly (a public IP, or a trusted private
network), `direct` binds the interception server(s) on `bind_host` (0.0.0.0 by
default) and reaches each at `http://{host}:{port}` — no tunnel, no proxy. It
multiplexes like prime (a server per `multiplex` rollouts, each on its own
ephemeral port), so the firewall must allow those ports. Plaintext HTTP carrying
the per-rollout secret, so it's for trusted networks only (documented).
`Tunnel.bind_host` becomes a per-instance property (was a ClassVar) so `DirectTunnel`
derives it from config; `tunnel_cls` dispatches `direct` -> `DirectTunnel`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…default bind_host now defaults to host (None -> host) instead of 0.0.0.0, so a direct server listens only on the interface it advertises — bind a private NIC and the port is never opened on the public NIC. host (the dialable URL address) and bind_host (the local listen interface) stay distinct (host can't be 0.0.0.0); bind_host is an optional override (0.0.0.0 for all interfaces, or a different local IP behind a 1:1 NAT). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…irect` type A reachable host doesn't need a separate type — `custom` with `url=http://<host>:<port>` is direct exposure, it just needed to bind a reachable interface instead of loopback. So add `bind_host` to CustomInterceptionConfig (default 127.0.0.1 for a same-host reverse proxy; set 0.0.0.0 / a public/LAN IP to expose `port` directly, no proxy) and remove the `direct` type (DirectTunnel / DirectInterceptionConfig / tunnel/direct.py). Interception types: prime, custom. Direct binds are plaintext HTTP carrying the per-rollout secret — trusted-network only (documented on the field). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Match the v0 cli_agent_env path (verifiers/utils/interception_utils.py binds 0.0.0.0 unconditionally): CustomTunnel binds all interfaces, so `url` is the only knob — a reverse proxy you front it with, or a direct `http://<host>:<port>` on a reachable host, both work with no extra field. Drop CustomInterceptionConfig.bind_host and revert Tunnel.bind_host to a ClassVar (prime stays 127.0.0.1, stricter than v0). The interception port is plaintext (auth'd by the per-rollout secret); fronting it with TLS/firewall on an untrusted network is the operator's job (security TBD). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- type Environment._interception as non-optional InterceptionPool (set in serving; drop the episode None-guard and the serving-finally reset) - type tunnel_cls / InterceptionPool config params as InterceptionConfig (was BaseInterceptionConfig) - unquote server.py forward-ref annotations via `from __future__ import annotations` - trim the EnvConfig.interception docstring Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The serving() finally reset _shared_urls but left _interception pointing at the torn-down pool. del it instead so misuse outside serving() fails loudly (AttributeError, like before the first serving()) without reintroducing None. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # verifiers/v1/runtimes/base.py
ApprovabilityVerdict: Needs human review This PR introduces a new feature (configurable tunneling) with new abstractions, configuration types, and runtime behavior changes. Unresolved review comments identify potential bugs including a crash scenario with custom tunnels and improper exception propagation. You can customize Macroscope's approvability policy. Learn more. |
| from verifiers.v1.interception.tunnel.prime import PrimeTunnel | ||
|
|
||
|
|
||
| def tunnel_cls(config: InterceptionConfig) -> type[Tunnel]: |
There was a problem hiding this comment.
do we really need this function?
|
|
||
| __all__ = [ | ||
| "Tunnel", | ||
| "PrimeTunnel", |
There was a problem hiding this comment.
why do export those? we only need the configs for downstream usage, no?
There was a problem hiding this comment.
i think its good practice + e.g. prime runtime is using it iirc
| # Each server owns its own tunnel instance: it binds where the tunnel reaches it | ||
| # (bind_host/bind_port) and exposes that bound port to the harness via `server.reachable`. | ||
| # Both are owned by the pool's stack, torn down with it (LIFO). | ||
| server = InterceptionServer(self.tunnel_cls(self.config)) |
There was a problem hiding this comment.
when we use a custom tunnel, it will re-use the same port and then crash
There was a problem hiding this comment.
custom tunnel is single server by default. since you bind a single url, we cannot autoscale the interception servers like we can with prime tunenels which we can provision and tear down on the fly, so id say this is by design
Per review: multiplex and the pool only make sense for prime (managing the prime_tunnel creation cap); a custom BYO endpoint is structurally one server. - move `multiplex` off `BaseInterceptionConfig` onto `PrimeInterceptionConfig` (custom no longer carries a dead field) - add an `Interception` ABC (one `acquire`); `InterceptionPool` (prime, multiplexed) and `SingleInterception` (custom, one server) implement it; `Environment.interception()` picks by config type - drop `Tunnel.single_server` — the pool/single split now lives at the Interception level, not as a flag on the tunnel - remove the `tunnel_cls` dispatch function: the pool is prime-only (always PrimeTunnel), SingleInterception builds CustomTunnel directly - trim `interception.__all__` to the config types (tunnels are internal) Verified: custom (SingleInterception) reward=1.0; prime (pool) reward=1.0 with tunnel lifecycle 1 appears / 0 leftover. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Server The custom case is one server, so a separate SingleInterception was just ceremony around one InterceptionServer. Make InterceptionServer itself the single-server Interception: - InterceptionServer(tunnel, is_local) subclasses Interception: its __aenter__ binds AND makes itself reachable (sets base_url), and it gains acquire(); delete SingleInterception / single.py - InterceptionPool keeps a server per multiplex slot and delegates acquire to server.acquire (slot logic lives in one place); PooledServer drops base_url - make_interception(runtime, config) factory picks server (custom) vs pool (prime), mirroring make_runtime; Environment.interception() calls it - base.py imports RolloutSession under TYPE_CHECKING to break the cycle - fix the interception-down log to report the real bound host (drop the stale _HOST constant) — now matches the up log for a 0.0.0.0 (custom) bind Verified: local multiplex growth (128 acquires -> 4 servers) + custom single server (128 sessions); real rollouts custom reward=1.0, prime reward=1.0 with tunnel lifecycle 1 appears / 0 leftover. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| async def __aexit__(self, *exc) -> None: | ||
| # tears down every server (+ its tunnel) on `_stack`, LIFO, even if one teardown fails | ||
| await self._stack.aclose() |
There was a problem hiding this comment.
🟢 Low interception/base.py:33
self._stack.aclose() always passes (None, None, None) to the stack's exit callbacks, so exceptions passed to Interception.__aexit__ are dropped and nested context managers that could suppress them never receive the exception info. The -> None return type also discards any suppression value the stack returns. Change the call to return await self._stack.__aexit__(*exc) and update the return type to bool | None.
- async def __aexit__(self, *exc) -> None:
+ async def __aexit__(self, *exc) -> bool | None:
# tears down every server (+ its tunnel) on `_stack`, LIFO, even if one teardown fails
- await self._stack.aclose()
+ return await self._stack.__aexit__(*exc)🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/interception/base.py around lines 33-35:
`self._stack.aclose()` always passes `(None, None, None)` to the stack's exit callbacks, so exceptions passed to `Interception.__aexit__` are dropped and nested context managers that could suppress them never receive the exception info. The `-> None` return type also discards any suppression value the stack returns. Change the call to `return await self._stack.__aexit__(*exc)` and update the return type to `bool | None`.
Evidence trail:
verifiers/v1/interception/base.py lines 33-35 (REVIEWED_COMMIT): `async def __aexit__(self, *exc) -> None:` calls `await self._stack.aclose()` instead of `return await self._stack.__aexit__(*exc)`. CPython contextlib.py `AsyncExitStack.aclose` implementation: `async def aclose(self): await self.__aexit__(None, None, None)` — confirms aclose always passes (None, None, None). Python docs: 'close()' / 'aclose()' — 'the arguments passed in will indicate that no exception occurred.'
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b1e522e. Configure here.
is_local is a deployment fact (is the consumer on the host network?), not a tunnel concern. The localhost short-circuit lived inside Tunnel.reachable, so even the local case went through the tunnel object. Make the tunnel purely about remote exposure: - drop Tunnel.reachable; a Tunnel is now just bind_host/bind_port + expose() - InterceptionServer.__aenter__ branches on is_local: local -> bind an ephemeral loopback port, reach at localhost (tunnel untouched); remote -> bind per the tunnel and expose() - reachable_url likewise: local consumer -> localhost; remote -> PrimeTunnel().expose - a local custom harness now ignores the BYO url/port (localhost on any port) instead of binding the fixed port it doesn't need Verified: local prime 4 servers + custom-local uses ephemeral loopback (BYO ignored); remote custom reward=1.0; remote prime reward=1.0 with tunnel lifecycle 1 created/1 deleted (1 appears/0 leftover, run in isolation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The interception was reached two ways: the harness used base_url (the configured
tunnel, honoring custom), while non-colocated tool/user servers recomputed their
own reach via reachable_url(HOST, state_port) — hardcoded prime. Unify to one
base_url used by everyone:
- decide reachability from ALL consumers, not just the harness: expose via the
configured tunnel if the harness OR any tool/user runtime is remote, localhost
only when everything is local (Environment.interception reads tool/user
runtimes off the first task, as shared_tools already does)
- harness reaches the model at {base_url}/v1; tool/user servers reach /state +
/task at base_url directly — no per-server recompute, no second tunnel, and
custom is honored everywhere
- collapse Slot to (base_url, secret); drop state_port and the
reachable_url(HOST, state_port) paths in serve/serve_tools/serve_user
- make_interception / InterceptionPool now take is_local (bool); the policy lives
in Environment
Verified: 23 fast tests; local /state e2e (tool_state, user, shared isolation) 6
passed; cross-runtime e2e harness-in-prime + tool-in-subprocess passed (one
public base_url used by both a remote harness and a local tool); custom echo
reward=1.0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reachability between a service and its consumer is a cross-cutting serving concern, not a single runtime's. After the interception unification reachable_url's only caller is mcp serving, yet it lived in runtimes/base.py and reached back into interception.tunnel (a lazy import to dodge a cycle). Move it next to its caller: - relocate reachable_url to mcp/launch.py; it imports PrimeTunnel directly (no lazy import, no runtimes -> interception back-dependency) - collapse the signature to reachable_url(service, port, *, colocated, consumer_is_local) -- drop the consumer Runtime|None duality + the service-is-consumer identity check; the caller passes the two bools it has. consumer_is_local for a tool is read off the harness runtime object (a HOST-driven user is local; a shared eval-level tool uses harness_is_local) - delete the now-unused _Host / HOST sentinel (it existed only for reachable_url) - drop the dead HOST-as-service handling + the stale "interception reachability" docstring; Runtime keeps only expose (the real per-runtime primitive) Verified: ruff + 23 fast tests; /state e2e across all four reachability branches -- host colocated/own-runtime (localhost), modal tool/user/shared (expose), and modal-harness + host-tool (tunnel) -- all pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Summary
Adds an
InterceptionConfigtoEnvConfig, making how the host interception server is reached from a remote harness runtime a first-class, pluggable choice instead of a hardcodedprime_tunnelcall. It's a discriminated union (--interception.type prime|custom):prime(default) — expose the host interception port viaprime_tunnel(frpc). Identical to today's behavior; works from any host with prime credentials, for harnesses in prime or modal sandboxes alike. Pooled:multiplexrollouts share one server (one tunnel), grown on demand to stay under the prime_tunnel creation cap.custom— bring your own endpoint. The framework opens no tunnel; the server binds all interfaces on a fixedportand the harness reaches it at a publicurl— either a reverse proxy you front it with (nginx/caddy, ngrok, …) or a directhttp://<host>:<port>on a reachable host. One URL is one server, shared by every rollout (no pool, no multiplex). The port is plaintext HTTP (auth'd by the per-rollout secret), so front it with TLS/a firewall on an untrusted network.What changed
verifiers/v1/interception/tunnel/(new subpackage, mirroringruntimes/):base.pyholdsTunnel, the contract for making the host interception server reachable from a remote harness — the host-side counterpart to aRuntime(Runtime.exposepublishes a port inside a sandbox;Tunnel.exposepublishes a host port outward). It's generic over its config (Tunnel[ConfigT]) so a subclass'sself.configis typed.prime.py/custom.pyare the implementations (PrimeTunnel,CustomTunnel), each owning itsbind_host,bind_port, andexpose();PrimeTunnel.exposeinlines the prime_tunnel mechanism + retry and owns the host-wideTUNNEL_LIMITER(512/min);CustomTunnelbinds0.0.0.0(all interfaces, matching the v0 path) and yields the configuredurl.verifiers/v1/interception/config.py(new): theInterceptionConfigdiscriminated union.multiplexlives onPrimeInterceptionConfig(it manages the prime_tunnel creation cap — a prime concern);BaseInterceptionConfigis just the union's common base.InterceptionABC (newbase.py) with one method,acquire(session) -> (base_url, secret), behind two shapes picked by config type (make_interception, mirroringmake_runtime):InterceptionPool(prime) — grows servers, one behind its ownPrimeTunnelpermultiplexrollouts, and delegatesacquireto the chosen server.InterceptionServer(custom) — the server is the single-serverInterception: it binds the BYO endpoint, makes itself reachable, and every rollout shares it (no pool, no growth). The pool composes many of these for prime.base_urlfor the whole interception server, used by every consumer. The harness reaches the model at{base_url}/v1; tool/user servers reach this rollout's/state+/taskatbase_urldirectly. Reachability is decided from all consumers, not just the harness: the interception is exposed via its configured tunnel if the harness or any tool/user runtime is remote, and reached atlocalhostonly when everything is local (Environment.interceptionreads tool/user runtimes off the first task, asshared_toolsalready does). This removes the second reach path (non-colocated tool/user servers no longer recompute viareachable_url(HOST, state_port, …), which hardcoded prime), socustomis honored for every consumer — andSlotcollapses to(base_url, secret)(state_portis gone).InterceptionServer(tunnel, is_local): on enter, local → bind an ephemeral loopback port, reach atlocalhost(tunnel untouched); remote → bind where the tunnel says (bind_host/bind_port) andexpose()it. ATunneltherefore knows nothing about locality — it's purely remote exposure.TunnelErroris scoped to tunnel setup; a rollout-body error propagates unchanged.EnvConfig.multiplex→EnvConfig.interception(defaultPrimeInterceptionConfig()), preserving the current default behavior.reachable_url(server↔consumer reachability) moved fromruntimestomcp— it's a serving concern, and after the unification its only caller is mcp's tool/user serving. It now importsPrimeTunneldirectly (removing theruntimes → interceptionback-dependency + a lazy import), and its signature is simplified to(service, port, *, colocated, consumer_is_local)— noconsumer: Runtime | Noneduality. The_Host/HOSTsentinel (used only by it) is deleted;Runtimekeeps onlyexpose, the genuine per-runtime primitive.Breaking
EnvConfig.multiplexmoved toEnvConfig.interception.multiplex— and is now aprime-only field. The default (PrimeInterceptionConfig(multiplex=32)) keeps existing runs behaving exactly as before; only callers that setmultiplexneed to migrate. TOML:multiplex = 64→[interception]+multiplex = 64(typedefaults toprime). CLI:--multiplex 64→--interception.multiplex 64.customhas nomultiplex(it's structurally one server), so setting it there is a config error.InterceptionPool(runtime_config, multiplex)→InterceptionPool(is_local, config)whereconfigis aPrimeInterceptionConfig. Thecustomtype is served by a singleInterceptionServerinstead; both implement theInterceptioninterface thatRolloutconsumes (build viamake_interception(is_local, config)).serve/serve_tools/serve_user(mcp) drop thestate_portparameter — tool/user servers now reach the interception's/statechannel viastate_base(the onebase_url) instead of recomputing a host-port tunnel.runtimesno longer exportsreachable_urlorHOST(relocated tomcp);reachable_url's signature changed to(service, port, *, colocated, consumer_is_local).Verification
All real eval runs against PI inference (
deepseek/deepseek-v4-flash), harnesses in real sandboxes.Single rollout per type —
echo-v1, prime-sandbox harness, eachreward=1.0, num_turns=1, errors=[]:prime_tunnel, reached from the prime-sandbox harnessprime_tunnelhanded in as an opaque BYOurl+ fixedporturl=http://<public-ip>:<port>, harness hitting this host directly (no proxy/tunnel)For
prime, asserted the tunnel lifecycle against the prime tunnel API (TunnelClient.list_tunnels): exactly 1 interception tunnel appears during the run and 0 are left over after it (torn down).Scale — 128 concurrent
gsm8k-v1rollouts (modal-sandbox harnesses; interception is host-side and orthogonal to the harness runtime):multiplex=32)Server/tunnel counts also confirmed deterministically: 128 concurrent
pool.acquires →len(pool._servers)== 4 (prime) / 1 (custom).Tests:
ruff check+ruff formatclean on the touched files;tests/v1 -m "not e2e and not prime and not modal"(23) passes.Note
High Risk
Touches core eval networking (tunnels, shared state, remote harness/tool reachability) with a breaking
EnvConfig.multiplexmove; miscomputedis_localor custom URLs could break rollouts at scale.Overview
Adds
InterceptionConfigonEnvConfig(primedefault vscustomBYOurl/port), replacing top-levelmultiplexwithinterception.multiplexfor prime only.Introduces an
InterceptionABC (acquire→(base_url, secret)),make_interception, andinterception/tunnel/(PrimeTunnel,CustomTunnel).InterceptionServertakes a tunnel and owns bind/expose;InterceptionPoolis built fromis_local+PrimeInterceptionConfiginstead of harness runtime config.Reachability is computed from the harness and any remote tool/user runtimes (
Environment._has_remote_server), so onebase_urlserves model (/v1) and shared state (/state,/task). MCP launch dropsstate_portand per-rollout host bridges;host_endpoint,HOST, and runtimereachable_urlmove out ofruntimes/base(tool reachability stays inmcp/launchwithPrimeTunnel).Reviewed by Cursor Bugbot for commit a121f4f. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Summary
Adds an
InterceptionConfigtoEnvConfig, making how the host interception server is reached from a remote harness runtime a first-class, pluggable choice instead of a hardcodedprime_tunnelcall. It's a discriminated union (--interception.type prime|custom):prime(default) — expose the host interception port viaprime_tunnel(frpc). Identical to today's behavior; works from any host with prime credentials, for harnesses in prime or modal sandboxes alike. Pooled:multiplexrollouts share one server (one tunnel), grown on demand to stay under the prime_tunnel creation cap.custom— bring your own endpoint. The framework opens no tunnel; the server binds all interfaces on a fixedportand the harness reaches it at a publicurl— either a reverse proxy you front it with (nginx/caddy, ngrok, …) or a directhttp://<host>:<port>on a reachable host. One URL is one server, shared by every rollout (no pool, no multiplex). The port is plaintext HTTP (auth'd by the per-rollout secret), so front it with TLS/a firewall on an untrusted network.What changed
verifiers/v1/interception/tunnel/(new subpackage, mirroringruntimes/):base.pyholdsTunnel, the contract for making the host interception server reachable from a remote harness — the host-side counterpart to aRuntime(Runtime.exposepublishes a port inside a sandbox;Tunnel.exposepublishes a host port outward). It's generic over its config (Tunnel[ConfigT]) so a subclass'sself.configis typed.prime.py/custom.pyare the implementations (PrimeTunnel,CustomTunnel), each owning itsbind_host,bind_port, andexpose();PrimeTunnel.exposeinlines the prime_tunnel mechanism + retry and owns the host-wideTUNNEL_LIMITER(512/min);CustomTunnelbinds0.0.0.0(all interfaces, matching the v0 path) and yields the configuredurl.verifiers/v1/interception/config.py(new): theInterceptionConfigdiscriminated union.multiplexlives onPrimeInterceptionConfig(it manages the prime_tunnel creation cap — a prime concern);BaseInterceptionConfigis just the union's common base.InterceptionABC (newbase.py) with one method,acquire(session) -> (base_url, secret), behind two shapes picked by config type (make_interception, mirroringmake_runtime):InterceptionPool(prime) — grows servers, one behind its ownPrimeTunnelpermultiplexrollouts, and delegatesacquireto the chosen server.InterceptionServer(custom) — the server is the single-serverInterception: it binds the BYO endpoint, makes itself reachable, and every rollout shares it (no pool, no growth). The pool composes many of these for prime.base_urlfor the whole interception server, used by every consumer. The harness reaches the model at{base_url}/v1; tool/user servers reach this rollout's/state+/taskatbase_urldirectly. Reachability is decided from all consumers, not just the harness: the interception is exposed via its configured tunnel if the harness or any tool/user runtime is remote, and reached atlocalhostonly when everything is local (Environment.interceptionreads tool/user runtimes off the first task, asshared_toolsalready does). This removes the second reach path (non-colocated tool/user servers no longer recompute viareachable_url(HOST, state_port, …), which hardcoded prime), socustomis honored for every consumer — andSlotcollapses to(base_url, secret)(state_portis gone).InterceptionServer(tunnel, is_local): on enter, local → bind an ephemeral loopback port, reach atlocalhost(tunnel untouched); remote → bind where the tunnel says (bind_host/bind_port) andexpose()it. ATunneltherefore knows nothing about locality — it's purely remote exposure.TunnelErroris scoped to tunnel setup; a rollout-body error propagates unchanged.EnvConfig.multiplex→EnvConfig.interception(defaultPrimeInterceptionConfig()), preserving the current default behavior.reachable_url(server↔consumer reachability) moved fromruntimestomcp— it's a serving concern, and after the unification its only caller is mcp's tool/user serving. It now importsPrimeTunneldirectly (removing theruntimes → interceptionback-dependency + a lazy import), and its signature is simplified to(service, port, *, colocated, consumer_is_local)— noconsumer: Runtime | Noneduality. The_Host/HOSTsentinel (used only by it) is deleted;Runtimekeeps onlyexpose, the genuine per-runtime primitive.Breaking
EnvConfig.multiplexmoved toEnvConfig.interception.multiplex— and is now aprime-only field. The default (PrimeInterceptionConfig(multiplex=32)) keeps existing runs behaving exactly as before; only callers that setmultiplexneed to migrate. TOML:multiplex = 64→[interception]+multiplex = 64(typedefaults toprime). CLI:--multiplex 64→--interception.multiplex 64.customhas nomultiplex(it's structurally one server), so setting it there is a config error.InterceptionPool(runtime_config, multiplex)→InterceptionPool(is_local, config)whereconfigis aPrimeInterceptionConfig. Thecustomtype is served by a singleInterceptionServerinstead; both implement theInterceptioninterface thatRolloutconsumes (build viamake_interception(is_local, config)).serve/serve_tools/serve_user(mcp) drop thestate_portparameter — tool/user servers now reach the interception's/statechannel viastate_base(the onebase_url) instead of recomputing a host-port tunnel.runtimesno longer exportsreachable_urlorHOST(relocated tomcp);reachable_url's signature changed to(service, port, *, colocated, consumer_is_local).Verification
All real eval runs against PI inference (
deepseek/deepseek-v4-flash), harnesses in real sandboxes.Single rollout per type —
echo-v1, prime-sandbox harness, eachreward=1.0, num_turns=1, errors=[]:prime_tunnel, reached from the prime-sandbox harnessprime_tunnelhanded in as an opaque BYOurl+ fixedporturl=http://<public-ip>:<port>, harness hitting this host directly (no proxy/tunnel)For
prime, asserted the tunnel lifecycle against the prime tunnel API (TunnelClient.list_tunnels): exactly 1 interception tunnel appears during the run and 0 are left over after it (torn down).Scale — 128 concurrent
gsm8k-v1rollouts (modal-sandbox harnesses; interception is host-side and orthogonal to the harness runtime):multiplex=32)Server/tunnel counts also confirmed deterministically: 128 concurrent
pool.acquires →len(pool._servers)== 4 (prime) / 1 (custom).Tests:
ruff check+ruff formatclean on the touched files;tests/v1 -m "not e2e and not prime and not modal"(23) passes.Changes since #1830 opened
HOSTconstant,_Hostclass, andreachable_urlasync context manager fromverifiers.v1.runtimes.basemodule and removed their exports fromverifiers.v1.runtimespackage [a121f4f]reachable_urlasync context manager fromverifiers.v1.runtimes.basetoverifiers.v1.mcp.launchwith a new signature replacing runtime-based consumer parameter with explicit boolean flags [a121f4f]mcp.launch.serveasync context manager to compute explicitcolocatedandconsumer_is_localboolean flags forreachable_urlinstead of passing consumer runtime orHOSTsentinel [a121f4f]TunnelErrorandPrimeTunneltoverifiers.v1.mcp.launchand updatedruntimesimport to only includeRuntimeandmake_runtime[a121f4f]