[codex] Support raw image offload in v1 train client by eligotts · Pull Request #1746 · PrimeIntellect-ai/verifiers

eligotts · 2026-06-18T07:04:56Z

Design update — inline/offload image storage

This PR now follows the prime-rl multimodal image storage policy:

offload: current behavior, rewrite base64 data images to file:// run assets and require file-backed image URLs.
inline: keep data:image/...;base64,... URLs in the message payload and validate them without rewriting.

TrainClient now calls the policy-aware image preparation helper, so prime-rl can be the single source of truth via environment/config propagation.

Validation after latest push: uv run pytest tests/v1/test_train_client_multimodal.py -q passed (5 passed). Commit/push hooks also passed (ruff check, ruff format, generated AGENTS/CLAUDE check, ty).

Design update — dropped the `None`/cache-only image path

This PR and its companions (prime-rl #2836 / verifiers #1746 / renderers #89) no longer use the "send None for already-cached images" mechanism. Every image carries its raw descriptor ref at every slot (current and prior turns); /inference/v1/generate rematerializes each ref from disk every request.

Why: the None path coupled correctness to deployment (LRU cache present, single replica / DP-affinity, no eviction) and surfaced a miss as a hard vLLM EngineDeadError (qwen3-vl mrope dereferences a None image_grid_thw) that the retry net couldn't catch across the engine→API IPC. Dropping it is deployment-agnostic (a miss is impossible) and non-hacky. vLLM's mm_hash encoder cache still skips the expensive GPU re-encode for free — we only forgo the cheap IPC/CPU-reprocess dedup.

Validated: color-codeword (Qwen3-VL-4B) under DP=2, no affinity / no cache reliance: 0 crashes, 0 data=None, multi-turn accumulation correct, reward ~0.84. Also confirmed under TP.

This repo: with every image carrying its ref, no cache miss can occur — removed the retry subsystem (_generate_with_image_ref_retry, _has_descriptor_only_images, _retryable_mm_error_type, _json_error_type, _RETRYABLE_MM_ERROR_TYPES). Rollouts call renderers.client.generate directly. Obsolete retry tests removed.

Original description

Summary

tighten v1 multimodal graph serialization around strict raw descriptor sidecars
reject processed multimodal payload keys recursively, including nested pixel_values, image_embeds, and image_features
update v1 multimodal tests to use strict prime_raw_mm_item envelopes instead of descriptor-only Qwen payloads
keep raw image offload and retry behavior aligned with the companion Renderers and Prime-RL PRs

Companion PRs

Prime-RL: Support v1 raw multimodal image offload prime-rl#2836
Renderers: [codex] Support raw image refs for multimodal rendering renderers#89

Notes

Draft/WIP: this depends on the renderer generic raw multimodal ref contract in the companion PR.
v1 multimodal sidecars intentionally carry raw descriptors only, not processed image tensors or image-processor payloads.
Prime/vLLM materialization happens from raw image refs rather than Verifiers-held processor outputs.

Validation

Commit hooks: ruff check, ruff format, generated AGENTS/CLAUDE check passed.
Push hook: ty (ci parity) passed.
End-to-end hosted-style smoke through Prime-RL with /home/ubuntu/verifiers, /home/ubuntu/renderers, and /home/ubuntu/prime-rl-v1-raw-mm-offload completed inference, env rollouts, train batch creation, trainer step 0, and decoded strict trainer-bound raw image refs.

[!NOTE]

Add raw image offload and multimodal bridging support to v1 train client

Adds prepare_images_inplace in verifiers/utils/multimodal.py to walk nested message structures, offload image URLs to file-backed run assets, and enforce file:// URLs for multimodal training.

TrainClient overrides new prepare_request_body and prepare_messages hooks (defined on Client) to run image normalization before requests are sent.

InterceptionServer.handle_request now calls client preparation hooks on incoming requests and simulator-produced messages, surfacing failures as structured errors.

Multimodal graph handling is extended: _attribute_mm assigns per-message placeholders, MessageNode validates that only raw descriptors (not processed tensors) are serialized/deserialized, and PendingTurn.previous_multi_modal_data merges prefix multimodal data for turn bridging.

TrainClient.get_response no longer blocks bridging on multimodal detection; it passes previous_multi_modal_data into bridge_to_next_turn whenever the renderer is multimodal.

Risk: Requests with image URLs that cannot be offloaded to run assets will now raise RuntimeError at request time rather than failing silently later.

^{Macroscope summarized 9b3e7ee.}

Note

Medium Risk
Multimodal training now hard-requires offloadable file:// image URLs and a compatible renderers version; invalid or remote-only images fail at request time. Graph serialization rejects old processed multimodal payloads, so mixed-version rollouts could break until companions land.

Overview
Adds prepare_images_inplace so renderer-backed clients rewrite image parts to file:// run assets (via renderers.mm_store) before tokenization, and fail if images are not file-backed after preparation.

v1 TrainClient implements new prepare_request_body / prepare_messages hooks (also on the base Client); the interception server runs them on incoming bodies and user-simulator messages. Multi-turn bridging no longer skips multimodal prompts: it passes previous_multi_modal_data into bridge_to_next_turn when the renderer is multimodal.

Trace graph multimodal sidecars are raw descriptor dicts (raw_image_uri, hashes, placeholders)—processed tensors (pixel_values, etc.) are rejected on serialize/deserialize. _attribute_mm and branch merge now carry placeholder ranges; PendingTurn.previous_multi_modal_data merges prefix sidecars for bridging.

v0 RendererClient runs the same image prep in to_native_prompt. Legacy v0→v1 rollouts keep live cumulative multi_modal_data on the trajectory when building traces (avoiding JSON-save deltas).

^{Reviewed by Cursor Bugbot for commit 9b3e7ee. Bugbot is set up for automated code reviews on this repo. Configure here.}

Every image carries its ref, so no cache miss can occur. Removes _generate_with_image_ref_retry / _has_descriptor_only_images / _retryable_mm_error_type / _json_error_type / _RETRYABLE_MM_ERROR_TYPES; rollouts call generate() directly. Obsolete retry tests removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…fload # Conflicts: # verifiers/v1/clients/train.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Want reviews to match your repository better? Bugbot Learning can learn team-specific rules from PR activity. A team admin can enable Learning in the Cursor dashboard.

^{Reviewed by Cursor Bugbot for commit 0b1d73f. Configure here.}

cursor · 2026-06-29T16:38:27Z

+    result = _offload_image_url(_image_source_url(source), image_dir)
+    if result is not None:
+        _set_image_source_url(source, result)
+    _require_file_image_url(source)


Inline image URLs always rejected

High Severity

The v1 train path always requires every image_url to become a file:// run asset after preparation, even when offload leaves a data:image/...;base64,... URL unchanged. That conflicts with the intended inline multimodal storage mode where base64 image URLs stay in the message payload, so inline training rollouts fail at request preparation instead of validating in place.

Additional Locations (1)

verifiers/v1/clients/train.py#L208-L216

^{Reviewed by Cursor Bugbot for commit 0b1d73f. Configure here.}

removed inline mode so this is irrelevant

macroscopeapp · 2026-06-29T16:38:58Z

Approvability

Verdict: Needs human review

2 blocking correctness issues found. This PR adds new multimodal image handling capabilities with significant new code and runtime behavior changes. Multiple unresolved review comments identify potential bugs including rejected inline images (high severity) and backwards compatibility concerns for existing saved rollouts.

No code changes detected at 9b3e7ee. Prior analysis still applies.

^{You can customize Macroscope's approvability policy. Learn more.}

macroscopeapp · 2026-06-29T19:47:01Z

+    return False
+
+
+def _validate_raw_mm_item(item: Any) -> dict[str, Any]:


🟡 Medium v1/graph.py:76

_validate_raw_mm_item now unconditionally rejects processed multimodal payloads containing keys like pixel_values, and deserialize_multi_modal_data runs it on every multi_modal_data field during deserialization. Loading a previously persisted multimodal v1 trace whose sidecars contain pixel_values now raises TypeError instead of round-tripping, breaking backwards compatibility for existing saved rollouts. Consider allowing processed payloads through on the deserialization path (e.g. by skipping the processed-key check in the validator's before path) so old traces can still be loaded.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file @verifiers/v1/graph.py around line 76: `_validate_raw_mm_item` now unconditionally rejects processed multimodal payloads containing keys like `pixel_values`, and `deserialize_multi_modal_data` runs it on every `multi_modal_data` field during deserialization. Loading a previously persisted multimodal v1 trace whose sidecars contain `pixel_values` now raises `TypeError` instead of round-tripping, breaking backwards compatibility for existing saved rollouts. Consider allowing processed payloads through on the deserialization path (e.g. by skipping the processed-key check in the validator's `before` path) so old traces can still be loaded.

…fload

macroscopeapp · 2026-06-29T22:40:24Z

+        if value.get("type") == "image_url":
+            source = value.get("image_url")
+            if source is not None:
+                _prepare_image_source(source, image_dir=image_dir)


🟡 Medium utils/multimodal.py:64

prepare_images_inplace skips validation when an image_url part has a missing or None image_url field: lines 65-67 only call _prepare_image_source when source is not None, so the malformed part passes through unchecked. Downstream, ChatDialect.parse_request normalizes it to ImageUrlSource(url=""), forwarding a request with an empty image URL instead of rejecting it. Consider calling _require_file_image_url(value) (or otherwise validating) when source is None so malformed parts are rejected.

Suggested change

if value.get("type") == "image_url":

source = value.get("image_url")

if source is not None:

_prepare_image_source(source, image_dir=image_dir)

if value.get("type") == "image_url":

source = value.get("image_url")

if source is not None:

_prepare_image_source(source, image_dir=image_dir)

else:

_require_file_image_url(value)

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file @verifiers/utils/multimodal.py around lines 64-67: `prepare_images_inplace` skips validation when an `image_url` part has a missing or `None` `image_url` field: lines 65-67 only call `_prepare_image_source` when `source is not None`, so the malformed part passes through unchecked. Downstream, `ChatDialect.parse_request` normalizes it to `ImageUrlSource(url="")`, forwarding a request with an empty image URL instead of rejecting it. Consider calling `_require_file_image_url(value)` (or otherwise validating) when `source` is `None` so malformed parts are rejected.

…fload # Conflicts: # verifiers/v1/cli/dashboard/eval.py

This was referenced Jun 18, 2026

[codex] Support raw image refs for multimodal rendering PrimeIntellect-ai/renderers#89

Open

Support v1 raw multimodal image offload PrimeIntellect-ai/prime-rl#2836

Open

eligotts force-pushed the codex/v1-raw-image-offload branch from 7556743 to 3f5bb1a Compare June 23, 2026 19:23

eligotts changed the base branch from feat/nano-as-v1 to main June 23, 2026 19:24

eligotts added 2 commits June 25, 2026 06:39

Support raw image offload in v1 train client

173a518

Enforce strict raw multimodal descriptors

de37650

eligotts force-pushed the codex/v1-raw-image-offload branch from 3f5bb1a to de37650 Compare June 25, 2026 06:40

macroscopeapp Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread verifiers/v1/clients/train.py

Comment thread verifiers/v1/graph.py Outdated

Comment thread verifiers/v1/clients/train.py Outdated

S1ro1 and others added 3 commits June 27, 2026 00:18

Merge remote-tracking branch 'origin/main' into codex/v1-raw-image-of…

9430999

…fload # Conflicts: # verifiers/v1/clients/train.py

feat: support inline multimodal images

4a7b37a

macroscopeapp Bot reviewed Jun 28, 2026

View reviewed changes

Comment thread verifiers/v1/utils/multimodal.py Outdated

Simplify v1 raw image offload path

0b1d73f

eligotts marked this pull request as ready for review June 29, 2026 16:36

cursor Bot reviewed Jun 29, 2026

View reviewed changes

eligotts added 3 commits June 29, 2026 17:41

Preserve v1 node usage in trace dumps

2d4969b

Surface request preparation failures on traces

7ade0b2

Require raw image URIs in v1 sidecars

0dc57a1

macroscopeapp Bot reviewed Jun 29, 2026

View reviewed changes

eligotts added 2 commits June 29, 2026 22:23

Share multimodal image preparation across clients

18b0fbe

Merge remote-tracking branch 'origin/main' into codex/v1-raw-image-of…

22c7cf4

…fload

macroscopeapp Bot reviewed Jun 29, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into codex/v1-raw-image-of…

9b3e7ee

…fload # Conflicts: # verifiers/v1/cli/dashboard/eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] Support raw image offload in v1 train client#1746

[codex] Support raw image offload in v1 train client#1746
eligotts wants to merge 12 commits into
mainfrom
codex/v1-raw-image-offload

eligotts commented Jun 18, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 29, 2026

Uh oh!

eligotts Jun 29, 2026

Uh oh!

macroscopeapp Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

macroscopeapp Bot Jun 29, 2026

Uh oh!

macroscopeapp Bot Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return False


		def _validate_raw_mm_item(item: Any) -> dict[str, Any]:

Uh oh!

Conversation

eligotts commented Jun 18, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design update — inline/offload image storage

Design update — dropped the None/cache-only image path

Summary

Companion PRs

Notes

Validation

Add raw image offload and multimodal bridging support to v1 train client

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 29, 2026

Choose a reason for hiding this comment

Inline image URLs always rejected

Uh oh!

eligotts Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

macroscopeapp Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eligotts commented Jun 18, 2026 •

edited by macroscopeapp Bot

Loading

Design update — dropped the `None`/cache-only image path

macroscopeapp Bot commented Jun 29, 2026 •

edited

Loading