feat: configurable tunneling in interception by mikasenghaas · Pull Request #1830 · PrimeIntellect-ai/verifiers

mikasenghaas · 2026-06-22T23:38:06Z

Summary

Adds an InterceptionConfig to EnvConfig, making how the host interception server is reached from a remote harness runtime a first-class, pluggable choice instead of a hardcoded prime_tunnel call. It's a discriminated union (--interception.type prime|custom):

prime (default) — expose the host interception port via prime_tunnel (frpc). Identical to today's behavior; works from any host with prime credentials, for harnesses in prime or modal sandboxes alike. Pooled: multiplex rollouts share one server (one tunnel), grown on demand to stay under the prime_tunnel creation cap.
custom — bring your own endpoint. The framework opens no tunnel; the server binds all interfaces on a fixed port and the harness reaches it at a public url — either a reverse proxy you front it with (nginx/caddy, ngrok, …) or a direct http://<host>:<port> on a reachable host. One URL is one server, shared by every rollout (no pool, no multiplex). The port is plaintext HTTP (auth'd by the per-rollout secret), so front it with TLS/a firewall on an untrusted network.

What changed

verifiers/v1/interception/tunnel/ (new subpackage, mirroring runtimes/): base.py holds Tunnel, the contract for making the host interception server reachable from a remote harness — the host-side counterpart to a Runtime (Runtime.expose publishes a port inside a sandbox; Tunnel.expose publishes a host port outward). It's generic over its config (Tunnel[ConfigT]) so a subclass's self.config is typed. prime.py / custom.py are the implementations (PrimeTunnel, CustomTunnel), each owning its bind_host, bind_port, and expose(); PrimeTunnel.expose inlines the prime_tunnel mechanism + retry and owns the host-wide TUNNEL_LIMITER (512/min); CustomTunnel binds 0.0.0.0 (all interfaces, matching the v0 path) and yields the configured url.
verifiers/v1/interception/config.py (new): the InterceptionConfig discriminated union. multiplex lives on PrimeInterceptionConfig (it manages the prime_tunnel creation cap — a prime concern); BaseInterceptionConfig is just the union's common base.
Interception ABC (new base.py) with one method, acquire(session) -> (base_url, secret), behind two shapes picked by config type (make_interception, mirroring make_runtime):
- InterceptionPool (prime) — grows servers, one behind its own PrimeTunnel per multiplex rollouts, and delegates acquire to the chosen server.
- InterceptionServer (custom) — the server is the single-server Interception: it binds the BYO endpoint, makes itself reachable, and every rollout shares it (no pool, no growth). The pool composes many of these for prime.
One base_url for the whole interception server, used by every consumer. The harness reaches the model at {base_url}/v1; tool/user servers reach this rollout's /state + /task at base_url directly. Reachability is decided from all consumers, not just the harness: the interception is exposed via its configured tunnel if the harness or any tool/user runtime is remote, and reached at localhost only when everything is local (Environment.interception reads tool/user runtimes off the first task, as shared_tools already does). This removes the second reach path (non-colocated tool/user servers no longer recompute via reachable_url(HOST, state_port, …), which hardcoded prime), so custom is honored for every consumer — and Slot collapses to (base_url, secret) (state_port is gone).
InterceptionServer(tunnel, is_local): on enter, local → bind an ephemeral loopback port, reach at localhost (tunnel untouched); remote → bind where the tunnel says (bind_host / bind_port) and expose() it. A Tunnel therefore knows nothing about locality — it's purely remote exposure. TunnelError is scoped to tunnel setup; a rollout-body error propagates unchanged.
EnvConfig.multiplex → EnvConfig.interception (default PrimeInterceptionConfig()), preserving the current default behavior.
reachable_url (server↔consumer reachability) moved from runtimes to mcp — it's a serving concern, and after the unification its only caller is mcp's tool/user serving. It now imports PrimeTunnel directly (removing the runtimes → interception back-dependency + a lazy import), and its signature is simplified to (service, port, *, colocated, consumer_is_local) — no consumer: Runtime | None duality. The _Host/HOST sentinel (used only by it) is deleted; Runtime keeps only expose, the genuine per-runtime primitive.

Breaking

EnvConfig.multiplex moved to EnvConfig.interception.multiplex — and is now a prime-only field. The default (PrimeInterceptionConfig(multiplex=32)) keeps existing runs behaving exactly as before; only callers that set multiplex need to migrate. TOML: multiplex = 64 → [interception] + multiplex = 64 (type defaults to prime). CLI: --multiplex 64 → --interception.multiplex 64. custom has no multiplex (it's structurally one server), so setting it there is a config error.
InterceptionPool(runtime_config, multiplex) → InterceptionPool(is_local, config) where config is a PrimeInterceptionConfig. The custom type is served by a single InterceptionServer instead; both implement the Interception interface that Rollout consumes (build via make_interception(is_local, config)).
Internal serve / serve_tools / serve_user (mcp) drop the state_port parameter — tool/user servers now reach the interception's /state channel via state_base (the one base_url) instead of recomputing a host-port tunnel.
runtimes no longer exports reachable_url or HOST (relocated to mcp); reachable_url's signature changed to (service, port, *, colocated, consumer_is_local).

Verification

All real eval runs against PI inference (deepseek/deepseek-v4-flash), harnesses in real sandboxes.

Single rollout per type — echo-v1, prime-sandbox harness, each reward=1.0, num_turns=1, errors=[]:

type	how the host interception port is reached	result
prime	`prime_tunnel`, reached from the prime-sandbox harness	reward=1.0 ✓
custom (proxy)	a manually-started `prime_tunnel` handed in as an opaque BYO `url` + fixed `port`	reward=1.0 ✓
custom (direct)	`url=http://<public-ip>:<port>`, harness hitting this host directly (no proxy/tunnel)	reward=1.0 ✓

For prime, asserted the tunnel lifecycle against the prime tunnel API (TunnelClient.list_tunnels): exactly 1 interception tunnel appears during the run and 0 are left over after it (torn down).

Scale — 128 concurrent gsm8k-v1 rollouts (modal-sandbox harnesses; interception is host-side and orthogonal to the harness runtime):

type	interception servers / tunnels	rollouts	errors	mean reward
prime (`multiplex=32`)	4 tunnels (128 ÷ 32) — prime-API peak 4, 0 left over after	128	0	0.945
custom (single BYO)	1 shared server handling all 128	128	0	0.969

Server/tunnel counts also confirmed deterministically: 128 concurrent pool.acquires → len(pool._servers) == 4 (prime) / 1 (custom).

Tests: ruff check + ruff format clean on the touched files; tests/v1 -m "not e2e and not prime and not modal" (23) passes.

A modal interception type (modal.forward) was also built and verified end-to-end (a real rollout with reward=1.0 running the eval inside a Modal container) but removed before merge — modal.forward only works from inside a Modal container, which is awkward to ship. The modal runtime (for harness sandboxes) is unaffected.

Note

High Risk
Touches core eval networking (tunnels, shared state, remote harness/tool reachability) with a breaking EnvConfig.multiplex move; miscomputed is_local or custom URLs could break rollouts at scale.

Overview
Adds InterceptionConfig on EnvConfig (prime default vs custom BYO url/port), replacing top-level multiplex with interception.multiplex for prime only.

Introduces an Interception ABC (acquire → (base_url, secret)), make_interception, and interception/tunnel/ (PrimeTunnel, CustomTunnel). InterceptionServer takes a tunnel and owns bind/expose; InterceptionPool is built from is_local + PrimeInterceptionConfig instead of harness runtime config.

Reachability is computed from the harness and any remote tool/user runtimes (Environment._has_remote_server), so one base_url serves model (/v1) and shared state (/state, /task). MCP launch drops state_port and per-rollout host bridges; host_endpoint, HOST, and runtime reachable_url move out of runtimes/base (tool reachability stays in mcp/launch with PrimeTunnel).

^{Reviewed by Cursor Bugbot for commit a121f4f. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Summary

Adds an InterceptionConfig to EnvConfig, making how the host interception server is reached from a remote harness runtime a first-class, pluggable choice instead of a hardcoded prime_tunnel call. It's a discriminated union (--interception.type prime|custom):

prime (default) — expose the host interception port via prime_tunnel (frpc). Identical to today's behavior; works from any host with prime credentials, for harnesses in prime or modal sandboxes alike. Pooled: multiplex rollouts share one server (one tunnel), grown on demand to stay under the prime_tunnel creation cap.
custom — bring your own endpoint. The framework opens no tunnel; the server binds all interfaces on a fixed port and the harness reaches it at a public url — either a reverse proxy you front it with (nginx/caddy, ngrok, …) or a direct http://<host>:<port> on a reachable host. One URL is one server, shared by every rollout (no pool, no multiplex). The port is plaintext HTTP (auth'd by the per-rollout secret), so front it with TLS/a firewall on an untrusted network.

What changed

verifiers/v1/interception/tunnel/ (new subpackage, mirroring runtimes/): base.py holds Tunnel, the contract for making the host interception server reachable from a remote harness — the host-side counterpart to a Runtime (Runtime.expose publishes a port inside a sandbox; Tunnel.expose publishes a host port outward). It's generic over its config (Tunnel[ConfigT]) so a subclass's self.config is typed. prime.py / custom.py are the implementations (PrimeTunnel, CustomTunnel), each owning its bind_host, bind_port, and expose(); PrimeTunnel.expose inlines the prime_tunnel mechanism + retry and owns the host-wide TUNNEL_LIMITER (512/min); CustomTunnel binds 0.0.0.0 (all interfaces, matching the v0 path) and yields the configured url.
verifiers/v1/interception/config.py (new): the InterceptionConfig discriminated union. multiplex lives on PrimeInterceptionConfig (it manages the prime_tunnel creation cap — a prime concern); BaseInterceptionConfig is just the union's common base.
Interception ABC (new base.py) with one method, acquire(session) -> (base_url, secret), behind two shapes picked by config type (make_interception, mirroring make_runtime):
- InterceptionPool (prime) — grows servers, one behind its own PrimeTunnel per multiplex rollouts, and delegates acquire to the chosen server.
- InterceptionServer (custom) — the server is the single-server Interception: it binds the BYO endpoint, makes itself reachable, and every rollout shares it (no pool, no growth). The pool composes many of these for prime.
One base_url for the whole interception server, used by every consumer. The harness reaches the model at {base_url}/v1; tool/user servers reach this rollout's /state + /task at base_url directly. Reachability is decided from all consumers, not just the harness: the interception is exposed via its configured tunnel if the harness or any tool/user runtime is remote, and reached at localhost only when everything is local (Environment.interception reads tool/user runtimes off the first task, as shared_tools already does). This removes the second reach path (non-colocated tool/user servers no longer recompute via reachable_url(HOST, state_port, …), which hardcoded prime), so custom is honored for every consumer — and Slot collapses to (base_url, secret) (state_port is gone).
InterceptionServer(tunnel, is_local): on enter, local → bind an ephemeral loopback port, reach at localhost (tunnel untouched); remote → bind where the tunnel says (bind_host / bind_port) and expose() it. A Tunnel therefore knows nothing about locality — it's purely remote exposure. TunnelError is scoped to tunnel setup; a rollout-body error propagates unchanged.
EnvConfig.multiplex → EnvConfig.interception (default PrimeInterceptionConfig()), preserving the current default behavior.
reachable_url (server↔consumer reachability) moved from runtimes to mcp — it's a serving concern, and after the unification its only caller is mcp's tool/user serving. It now imports PrimeTunnel directly (removing the runtimes → interception back-dependency + a lazy import), and its signature is simplified to (service, port, *, colocated, consumer_is_local) — no consumer: Runtime | None duality. The _Host/HOST sentinel (used only by it) is deleted; Runtime keeps only expose, the genuine per-runtime primitive.

Breaking

EnvConfig.multiplex moved to EnvConfig.interception.multiplex — and is now a prime-only field. The default (PrimeInterceptionConfig(multiplex=32)) keeps existing runs behaving exactly as before; only callers that set multiplex need to migrate. TOML: multiplex = 64 → [interception] + multiplex = 64 (type defaults to prime). CLI: --multiplex 64 → --interception.multiplex 64. custom has no multiplex (it's structurally one server), so setting it there is a config error.
InterceptionPool(runtime_config, multiplex) → InterceptionPool(is_local, config) where config is a PrimeInterceptionConfig. The custom type is served by a single InterceptionServer instead; both implement the Interception interface that Rollout consumes (build via make_interception(is_local, config)).
Internal serve / serve_tools / serve_user (mcp) drop the state_port parameter — tool/user servers now reach the interception's /state channel via state_base (the one base_url) instead of recomputing a host-port tunnel.
runtimes no longer exports reachable_url or HOST (relocated to mcp); reachable_url's signature changed to (service, port, *, colocated, consumer_is_local).

Verification

All real eval runs against PI inference (deepseek/deepseek-v4-flash), harnesses in real sandboxes.

Single rollout per type — echo-v1, prime-sandbox harness, each reward=1.0, num_turns=1, errors=[]:

type	how the host interception port is reached	result
prime	`prime_tunnel`, reached from the prime-sandbox harness	reward=1.0 ✓
custom (proxy)	a manually-started `prime_tunnel` handed in as an opaque BYO `url` + fixed `port`	reward=1.0 ✓
custom (direct)	`url=http://<public-ip>:<port>`, harness hitting this host directly (no proxy/tunnel)	reward=1.0 ✓

For prime, asserted the tunnel lifecycle against the prime tunnel API (TunnelClient.list_tunnels): exactly 1 interception tunnel appears during the run and 0 are left over after it (torn down).

Scale — 128 concurrent gsm8k-v1 rollouts (modal-sandbox harnesses; interception is host-side and orthogonal to the harness runtime):

type	interception servers / tunnels	rollouts	errors	mean reward
prime (`multiplex=32`)	4 tunnels (128 ÷ 32) — prime-API peak 4, 0 left over after	128	0	0.945
custom (single BYO)	1 shared server handling all 128	128	0	0.969

Server/tunnel counts also confirmed deterministically: 128 concurrent pool.acquires → len(pool._servers) == 4 (prime) / 1 (custom).

Tests: ruff check + ruff format clean on the touched files; tests/v1 -m "not e2e and not prime and not modal" (23) passes.

A modal interception type (modal.forward) was also built and verified end-to-end (a real rollout with reward=1.0 running the eval inside a Modal container) but removed before merge — modal.forward only works from inside a Modal container, which is awkward to ship. The modal runtime (for harness sandboxes) is unaffected.

[!NOTE]
High Risk
Touches core eval networking (tunnels, shared state, remote harness/tool reachability) with a breaking EnvConfig.multiplex move; miscomputed is_local or custom URLs could break rollouts at scale.

Overview
Adds InterceptionConfig on EnvConfig (prime default vs custom BYO url/port), replacing top-level multiplex with interception.multiplex for prime only.

Introduces an Interception ABC (acquire → (base_url, secret)), make_interception, and interception/tunnel/ (PrimeTunnel, CustomTunnel). InterceptionServer takes a tunnel and owns bind/expose; InterceptionPool is built from is_local + PrimeInterceptionConfig instead of harness runtime config.

Reachability is computed from the harness and any remote tool/user runtimes (Environment._has_remote_server), so one base_url serves model (/v1) and shared state (/state, /task). MCP launch drops state_port and per-rollout host bridges; host_endpoint, HOST, and runtime reachable_url move out of runtimes/base (tool reachability stays in mcp/launch with PrimeTunnel).

^{Reviewed by Cursor Bugbot for commit a121f4f. Bugbot is set up for automated code reviews on this repo. Configure here.}

Changes since #1830 opened

Removed HOST constant, _Host class, and reachable_url async context manager from verifiers.v1.runtimes.base module and removed their exports from verifiers.v1.runtimes package [a121f4f]
Moved and refactored reachable_url async context manager from verifiers.v1.runtimes.base to verifiers.v1.mcp.launch with a new signature replacing runtime-based consumer parameter with explicit boolean flags [a121f4f]
Updated mcp.launch.serve async context manager to compute explicit colocated and consumer_is_local boolean flags for reachable_url instead of passing consumer runtime or HOST sentinel [a121f4f]
Added imports of TunnelError and PrimeTunnel to verifiers.v1.mcp.launch and updated runtimes import to only include Runtime and make_runtime [a121f4f]

Make *how the host interception server is reached from a remote harness runtime* a first-class, pluggable choice instead of a hardcoded prime_tunnel call. `EnvConfig.interception` is a discriminated union (`--interception.type prime|modal|url`) over three backends, with `multiplex` as the shared field (moved off the root EnvConfig): - prime (default): expose the host port via prime_tunnel (frpc) — unchanged behavior, works from any host with prime credentials. - modal: expose it via Modal's own forwarding (modal.forward) — for a Modal-hosted trainer/eval (modal.forward only works inside a container). - url: bring your own reverse proxy — open no tunnel, reach the server at a public `url` you front the fixed local `port` with. One URL == one server. `expose_interception(config, port, *, is_local)` is the single place a type maps to a reachable URL (localhost when the runtime is local, whatever the type). InterceptionPool now takes the config; InterceptionServer can bind a fixed port for the BYO target. BREAKING: EnvConfig.multiplex -> EnvConfig.interception.multiplex. The default (PrimeInterceptionConfig(multiplex=32)) keeps existing runs unchanged; only callers that set multiplex migrate (`--multiplex N` -> `--interception.multiplex N`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Move the host-reachability logic out of free functions (expose_interception / _modal_host_endpoint / bind_host) into a lightweight `Tunnel` class hierarchy, parallel to the Runtime hierarchy: `Tunnel` (the contract) + `PrimeTunnel`, `ModalTunnel`, `CustomTunnel`. Each owns its `bind_host`, `bind_port`, `single_server`, and `expose()`; the base `reachable()` handles the local short-circuit. Each `InterceptionConfig` owns its tunnel via `config.tunnel()`, so the pool is branch-free — it builds the tunnel once and drives the server through `bind_port`/`bind_host`/`reachable()`/`single_server`. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Split the single tunnel module into a package mirroring runtimes/: base.py (the Tunnel contract), prime.py / modal.py / custom.py (the implementations), and __init__.py with the make_tunnel(config) factory + _tunnel_cls dispatch. Tunnels are built from a config via make_tunnel (no method on the config), the host-side counterpart to make_runtime. Also tighten the EnvConfig.interception docstring ("from a remote runtime"). No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Match the tunnel class name: `UrlInterceptionConfig` -> `CustomInterceptionConfig`, discriminator `type="url"` -> `"custom"` (so `--interception.type prime|modal|custom`), served by `CustomTunnel`. The `url` field is unchanged (it is a URL). Also trim the config docstrings to one-liners — the per-type caveats (prime credentials, modal-container-only) live on the Tunnel classes now — and drop the `multiplex` paragraph from the config module docstring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…untime import - Derive `Tunnel.single_server` from `bind_port` (a fixed bind port can host only one server, so it's a single endpoint) instead of a separate ClassVar + `CustomTunnel` override. `CustomTunnel` now only overrides `bind_port`. - `ModalTunnel.expose` uses `modal.forward` directly (it is already an async context manager) instead of borrowing `runtimes.base.open_tunnel` for retry/wrapping, so `modal.py` imports nothing from `runtimes`. `prime.py` still reuses `host_endpoint` - the rate-limited prime_tunnel primitive shared with the tool-serving `reachable_url` path (the correct higher->lower dependency; duplicating it would also pull in the shared tunnel limiter). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… tunnel - Relocate `host_endpoint` + its `open_tunnel` retry helper out of `runtimes/base.py` into `interception/tunnel/prime.py` (the prime_tunnel primitive, alongside `PrimeTunnel`). `PrimeTunnel.expose` uses it directly; the tool-serving `reachable_url` (still in runtimes) imports it lazily to avoid an import cycle. The host-wide `TUNNEL_LIMITER` stays in `runtimes.limiters` (a shared rate-limit resource, used by sandbox creation too) and is imported lazily inside `host_endpoint`. So the tunnel package no longer imports a runtime *method*. - `InterceptionServer(tunnel)` binds where its tunnel says (`bind_host` / `bind_port`), instead of the pool threading those through as separate kwargs; no tunnel = loopback ephemeral (the bare-server default). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…or at call sites - `Tunnel` is generic over its config (`Tunnel[ConfigT]`), so `CustomTunnel`'s `self.config` is typed without a quoted forward-ref. `single_server` is a ClassVar again (the pool reads it off the tunnel class). - One dispatch, `tunnel_cls(config) -> type[Tunnel]` (dropped `make_tunnel`). - `InterceptionPool.tunnel_cls` (public) holds the class and builds one tunnel instance per server. `InterceptionServer(tunnel)` holds it as `server.tunnel`, binds where the tunnel says, and exposes itself via `server.reachable(is_local)`. - `TunnelError` is no longer raised inside the tunnel impls — `PrimeTunnel.expose` keeps only the inlined prime_tunnel mechanism + retry and lets a terminal failure propagate raw. The call sites classify it: `server.reachable` and `reachable_url` wrap tunnel *setup* (scoped via an exit stack, so a rollout-body error during `yield` isn't miscategorized). - `reachable_url` (runtimes) reaches a host service via `PrimeTunnel().reachable`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The prime_tunnel host bridge moved from runtimes.base.host_endpoint into PrimeTunnel (interception.tunnel); update the doc/comment references that still named the removed symbol. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lout path - Remove the `modal` interception type (ModalInterceptionConfig / ModalTunnel / tunnel/modal.py). modal.forward only works from inside a Modal container, so it was awkward to ship; the modal *runtime* (harness sandboxes) is unaffected. Interception types are now `prime` (default) and `custom`. - Drop the pool-less fallback in `Rollout._serve_interception` (it hardcoded a PrimeTunnel, ignoring the configured type). `episode()` only ever runs inside `Environment.serving()`, so the interception pool is always present — inline `self.interception.acquire(session)` and remove the dead branch + its imports. - Move `TUNNEL_LIMITER` (+ public `TUNNELS_PER_MIN`) into the tunnel subpackage (`interception/tunnel/prime.py`), built from `runtimes.limiters.creation_limiter`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Move trailing comments to their own line / use a local for the repeated bind_host so ruff format leaves the interception server + pool untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…serving() run() calls self.interception.acquire() unconditionally, so the pool must always be present. It is — episode() only runs inside Environment.serving() — so make the contract explicit: Rollout.interception is now required (InterceptionPool, not Optional) and episode() raises a clear error if serving() isn't active, instead of a later AttributeError. (Does not restore the old pool-less fallback, which hardcoded PrimeTunnel and ignored interception.type.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…face) For a host the harness can reach directly (a public IP, or a trusted private network), `direct` binds the interception server(s) on `bind_host` (0.0.0.0 by default) and reaches each at `http://{host}:{port}` — no tunnel, no proxy. It multiplexes like prime (a server per `multiplex` rollouts, each on its own ephemeral port), so the firewall must allow those ports. Plaintext HTTP carrying the per-rollout secret, so it's for trusted networks only (documented). `Tunnel.bind_host` becomes a per-instance property (was a ClassVar) so `DirectTunnel` derives it from config; `tunnel_cls` dispatches `direct` -> `DirectTunnel`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…default bind_host now defaults to host (None -> host) instead of 0.0.0.0, so a direct server listens only on the interface it advertises — bind a private NIC and the port is never opened on the public NIC. host (the dialable URL address) and bind_host (the local listen interface) stay distinct (host can't be 0.0.0.0); bind_host is an optional override (0.0.0.0 for all interfaces, or a different local IP behind a 1:1 NAT). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…irect` type A reachable host doesn't need a separate type — `custom` with `url=http://<host>:<port>` is direct exposure, it just needed to bind a reachable interface instead of loopback. So add `bind_host` to CustomInterceptionConfig (default 127.0.0.1 for a same-host reverse proxy; set 0.0.0.0 / a public/LAN IP to expose `port` directly, no proxy) and remove the `direct` type (DirectTunnel / DirectInterceptionConfig / tunnel/direct.py). Interception types: prime, custom. Direct binds are plaintext HTTP carrying the per-rollout secret — trusted-network only (documented on the field). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Match the v0 cli_agent_env path (verifiers/utils/interception_utils.py binds 0.0.0.0 unconditionally): CustomTunnel binds all interfaces, so `url` is the only knob — a reverse proxy you front it with, or a direct `http://<host>:<port>` on a reachable host, both work with no extra field. Drop CustomInterceptionConfig.bind_host and revert Tunnel.bind_host to a ClassVar (prime stays 127.0.0.1, stricter than v0). The interception port is plaintext (auth'd by the per-rollout secret); fronting it with TLS/firewall on an untrusted network is the operator's job (security TBD). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- type Environment._interception as non-optional InterceptionPool (set in serving; drop the episode None-guard and the serving-finally reset) - type tunnel_cls / InterceptionPool config params as InterceptionConfig (was BaseInterceptionConfig) - unquote server.py forward-ref annotations via `from __future__ import annotations` - trim the EnvConfig.interception docstring Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The serving() finally reset _shared_urls but left _interception pointing at the torn-down pool. del it instead so misuse outside serving() fails loudly (AttributeError, like before the first serving()) without reintroducing None. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts: # verifiers/v1/runtimes/base.py

macroscopeapp · 2026-06-23T04:56:42Z

Approvability

Verdict: Needs human review

This PR introduces a new feature (configurable tunneling) with new abstractions, configuration types, and runtime behavior changes. Unresolved review comments identify potential bugs including a crash scenario with custom tunnels and improper exception propagation.

^{You can customize Macroscope's approvability policy. Learn more.}

xeophon · 2026-06-23T13:22:00Z

+from verifiers.v1.interception.tunnel.prime import PrimeTunnel
+
+
+def tunnel_cls(config: InterceptionConfig) -> type[Tunnel]:


do we really need this function?

xeophon · 2026-06-23T13:23:59Z

+
+__all__ = [
+    "Tunnel",
+    "PrimeTunnel",


why do export those? we only need the configs for downstream usage, no?

i think its good practice + e.g. prime runtime is using it iirc

xeophon · 2026-06-23T13:25:14Z

+        # Each server owns its own tunnel instance: it binds where the tunnel reaches it
+        # (bind_host/bind_port) and exposes that bound port to the harness via `server.reachable`.
+        # Both are owned by the pool's stack, torn down with it (LIFO).
+        server = InterceptionServer(self.tunnel_cls(self.config))


when we use a custom tunnel, it will re-use the same port and then crash

custom tunnel is single server by default. since you bind a single url, we cannot autoscale the interception servers like we can with prime tunenels which we can provision and tear down on the fly, so id say this is by design

Per review: multiplex and the pool only make sense for prime (managing the prime_tunnel creation cap); a custom BYO endpoint is structurally one server. - move `multiplex` off `BaseInterceptionConfig` onto `PrimeInterceptionConfig` (custom no longer carries a dead field) - add an `Interception` ABC (one `acquire`); `InterceptionPool` (prime, multiplexed) and `SingleInterception` (custom, one server) implement it; `Environment.interception()` picks by config type - drop `Tunnel.single_server` — the pool/single split now lives at the Interception level, not as a flag on the tunnel - remove the `tunnel_cls` dispatch function: the pool is prime-only (always PrimeTunnel), SingleInterception builds CustomTunnel directly - trim `interception.__all__` to the config types (tunnels are internal) Verified: custom (SingleInterception) reward=1.0; prime (pool) reward=1.0 with tunnel lifecycle 1 appears / 0 leftover. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Server The custom case is one server, so a separate SingleInterception was just ceremony around one InterceptionServer. Make InterceptionServer itself the single-server Interception: - InterceptionServer(tunnel, is_local) subclasses Interception: its __aenter__ binds AND makes itself reachable (sets base_url), and it gains acquire(); delete SingleInterception / single.py - InterceptionPool keeps a server per multiplex slot and delegates acquire to server.acquire (slot logic lives in one place); PooledServer drops base_url - make_interception(runtime, config) factory picks server (custom) vs pool (prime), mirroring make_runtime; Environment.interception() calls it - base.py imports RolloutSession under TYPE_CHECKING to break the cycle - fix the interception-down log to report the real bound host (drop the stale _HOST constant) — now matches the up log for a 0.0.0.0 (custom) bind Verified: local multiplex growth (128 acquires -> 4 servers) + custom single server (128 sessions); real rollouts custom reward=1.0, prime reward=1.0 with tunnel lifecycle 1 appears / 0 leftover. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

macroscopeapp · 2026-06-23T20:51:11Z

+    async def __aexit__(self, *exc) -> None:
+        # tears down every server (+ its tunnel) on `_stack`, LIFO, even if one teardown fails
+        await self._stack.aclose()


🟢 Low interception/base.py:33

self._stack.aclose() always passes (None, None, None) to the stack's exit callbacks, so exceptions passed to Interception.__aexit__ are dropped and nested context managers that could suppress them never receive the exception info. The -> None return type also discards any suppression value the stack returns. Change the call to return await self._stack.__aexit__(*exc) and update the return type to bool | None.

- async def __aexit__(self, *exc) -> None: + async def __aexit__(self, *exc) -> bool | None: # tears down every server (+ its tunnel) on `_stack`, LIFO, even if one teardown fails - await self._stack.aclose() + return await self._stack.__aexit__(*exc)

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file @verifiers/v1/interception/base.py around lines 33-35: `self._stack.aclose()` always passes `(None, None, None)` to the stack's exit callbacks, so exceptions passed to `Interception.__aexit__` are dropped and nested context managers that could suppress them never receive the exception info. The `-> None` return type also discards any suppression value the stack returns. Change the call to `return await self._stack.__aexit__(*exc)` and update the return type to `bool | None`. Evidence trail: verifiers/v1/interception/base.py lines 33-35 (REVIEWED_COMMIT): `async def __aexit__(self, *exc) -> None:` calls `await self._stack.aclose()` instead of `return await self._stack.__aexit__(*exc)`. CPython contextlib.py `AsyncExitStack.aclose` implementation: `async def aclose(self): await self.__aexit__(None, None, None)` — confirms aclose always passes (None, None, None). Python docs: 'close()' / 'aclose()' — 'the arguments passed in will indicate that no exception occurred.'

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit b1e522e. Configure here.}

is_local is a deployment fact (is the consumer on the host network?), not a tunnel concern. The localhost short-circuit lived inside Tunnel.reachable, so even the local case went through the tunnel object. Make the tunnel purely about remote exposure: - drop Tunnel.reachable; a Tunnel is now just bind_host/bind_port + expose() - InterceptionServer.__aenter__ branches on is_local: local -> bind an ephemeral loopback port, reach at localhost (tunnel untouched); remote -> bind per the tunnel and expose() - reachable_url likewise: local consumer -> localhost; remote -> PrimeTunnel().expose - a local custom harness now ignores the BYO url/port (localhost on any port) instead of binding the fixed port it doesn't need Verified: local prime 4 servers + custom-local uses ephemeral loopback (BYO ignored); remote custom reward=1.0; remote prime reward=1.0 with tunnel lifecycle 1 created/1 deleted (1 appears/0 leftover, run in isolation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The interception was reached two ways: the harness used base_url (the configured tunnel, honoring custom), while non-colocated tool/user servers recomputed their own reach via reachable_url(HOST, state_port) — hardcoded prime. Unify to one base_url used by everyone: - decide reachability from ALL consumers, not just the harness: expose via the configured tunnel if the harness OR any tool/user runtime is remote, localhost only when everything is local (Environment.interception reads tool/user runtimes off the first task, as shared_tools already does) - harness reaches the model at {base_url}/v1; tool/user servers reach /state + /task at base_url directly — no per-server recompute, no second tunnel, and custom is honored everywhere - collapse Slot to (base_url, secret); drop state_port and the reachable_url(HOST, state_port) paths in serve/serve_tools/serve_user - make_interception / InterceptionPool now take is_local (bool); the policy lives in Environment Verified: 23 fast tests; local /state e2e (tool_state, user, shared isolation) 6 passed; cross-runtime e2e harness-in-prime + tool-in-subprocess passed (one public base_url used by both a remote harness and a local tool); custom echo reward=1.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reachability between a service and its consumer is a cross-cutting serving concern, not a single runtime's. After the interception unification reachable_url's only caller is mcp serving, yet it lived in runtimes/base.py and reached back into interception.tunnel (a lazy import to dodge a cycle). Move it next to its caller: - relocate reachable_url to mcp/launch.py; it imports PrimeTunnel directly (no lazy import, no runtimes -> interception back-dependency) - collapse the signature to reachable_url(service, port, *, colocated, consumer_is_local) -- drop the consumer Runtime|None duality + the service-is-consumer identity check; the caller passes the two bools it has. consumer_is_local for a tool is read off the harness runtime object (a HOST-driven user is local; a shared eval-level tool uses harness_is_local) - delete the now-unused _Host / HOST sentinel (it existed only for reachable_url) - drop the dead HOST-as-service handling + the stale "interception reachability" docstring; Runtime keeps only expose (the real per-runtime primitive) Verified: ruff + 23 fast tests; /state e2e across all four reachability branches -- host colocated/own-runtime (localhost), modal tool/user/shared (expose), and modal-harness + host-tool (tunnel) -- all pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mikasenghaas and others added 11 commits June 22, 2026 23:37

chore(v1): drop redundant discriminator comment in interception config

7bcdf96

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

style(v1): keep interception lines under ruff's line length

1ac9f0c

Move trailing comments to their own line / use a local for the repeated bind_host so ruff format leaves the interception server + pool untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

macroscopeapp Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread verifiers/v1/rollout.py Outdated

mikasenghaas and others added 6 commits June 23, 2026 01:24

mikasenghaas changed the title ~~feat: pluggable InterceptionConfig (prime/modal/url) on EnvConfig~~ feat: pluggable InterceptionConfig (prime/custom) on EnvConfig Jun 23, 2026

macroscopeapp Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread verifiers/v1/env.py

mikasenghaas changed the title ~~feat: pluggable InterceptionConfig (prime/custom) on EnvConfig~~ feat: configurable tunneling in interception Jun 23, 2026

mikasenghaas requested a review from kcoopermiller June 23, 2026 04:43

mikasenghaas and others added 2 commits June 23, 2026 04:45

Merge remote-tracking branch 'origin/main' into feat/interception-config

821c7e6

# Conflicts: # verifiers/v1/runtimes/base.py

mikasenghaas requested a review from xeophon June 23, 2026 04:49

mikasenghaas marked this pull request as ready for review June 23, 2026 04:53

cursor Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread verifiers/v1/env.py

xeophon reviewed Jun 23, 2026

View reviewed changes

cursor Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread verifiers/v1/interception/single.py Outdated

Comment thread verifiers/v1/env.py

macroscopeapp Bot reviewed Jun 23, 2026

View reviewed changes

cursor Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread verifiers/v1/interception/server.py

mikasenghaas and others added 4 commits June 23, 2026 21:00

style(interception): ruff format server.py

bd1ba16

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: configurable tunneling in interception#1830

feat: configurable tunneling in interception#1830
mikasenghaas wants to merge 25 commits into
mainfrom
feat/interception-config

mikasenghaas commented Jun 22, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

macroscopeapp Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

xeophon Jun 23, 2026

Uh oh!

xeophon Jun 23, 2026

Uh oh!

mikasenghaas Jun 23, 2026

Uh oh!

xeophon Jun 23, 2026

Uh oh!

mikasenghaas Jun 23, 2026

Uh oh!

Uh oh!

Uh oh!

macroscopeapp Bot Jun 23, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from verifiers.v1.interception.tunnel.prime import PrimeTunnel


		def tunnel_cls(config: InterceptionConfig) -> type[Tunnel]:

Uh oh!

Conversation

mikasenghaas commented Jun 22, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Breaking

Verification

Summary

What changed

Breaking

Verification

Changes since #1830 opened

Uh oh!

Uh oh!

Uh oh!

Uh oh!

macroscopeapp Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

xeophon Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

xeophon Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

xeophon Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

macroscopeapp Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mikasenghaas commented Jun 22, 2026 •

edited by macroscopeapp Bot

Loading

macroscopeapp Bot commented Jun 23, 2026 •

edited

Loading