add Verifiers v1 support to Prime CLI by xeophon · Pull Request #758 · PrimeIntellect-ai/prime

xeophon · 2026-06-23T09:34:40Z

Overview

Adds Verifiers v1 support to the Prime CLI while preserving compatibility with legacy and hosted evaluation flows.

What changed

routes local prime eval run and prime env init through the Verifiers v1 entrypoints
supports v1 taskset TOML configs, resume and dry-run flows, client headers, output directories, and trace conversion for Evals uploads
keeps generated hosted commands using --save-results on the legacy v0 evaluator
requires verifiers>=0.1.15.dev371 from PyPI instead of a Git source
regenerates the workspace lockfile for the Verifiers v1 dependency graph while retaining the repository's exclude-newer policy
keeps CI on the standard uv sync and uv run workflow for Python 3.11–3.13

User impact

Local evaluations use the Verifiers v1 CLI by default. Legacy and hosted-generated commands remain compatible with the existing hosted runner, while local v1 results can be uploaded through the Prime Evals flow.

Note

Medium Risk
Large changes to the default eval execution and upload path affect most local runs; legacy and hosted paths are retained but users on Python 3.10 or unpinned older verifiers will break.

Overview
Local prime eval run and prime env init now go through Verifiers v1 (verifiers.v1.cli.*), with plugin invocation using v1 console-style entrypoints and help text rewritten for prime commands.

v1 eval flow adds taskset TOML (@config), --resume, --dry-run, --client.base-url / --client.headers, default output dirs, post-run metadata.json, and convert_eval_results so v1 traces upload to Prime Evals while legacy results.jsonl still works. --save-results / -s keeps the v0 evaluator (hosted-generated commands). CLI --sampling-args is rejected for v1 unless legacy save mode; v1 uses --sampling.*.

Dependencies & platform: verifiers>=0.1.15.dev371 from PyPI (git pin removed), Python >=3.11,<3.14, CI/docker matrices 3.11–3.13 only. Root lock/trust metadata updated for the v1 dependency graph.

^{Reviewed by Cursor Bugbot for commit a3d41ab. Bugbot is set up for automated code reviews on this repo. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 518d3d9d3a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a3d41ab. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3d41abef6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-23T12:56:36Z

+        saved_headers = saved_config.get("client", {}).get("headers", {})
+        job_id = saved_headers.get("X-PI-Job-Id") or _build_job_id(job_target, model)
+        display_id = saved_headers.get(INTERNAL_ENV_DISPLAY_HEADER)
+        upstream_slug = display_id if isinstance(display_id, str) and "/" in display_id else None


Validate display headers before reusing them as slugs

When resuming a run saved from a local environment that was ahead of its upstream, INTERNAL_ENV_DISPLAY_HEADER can contain a display string such as wiki-search (local - ahead of primeintellect/wiki-search), not an owner/name slug. This check accepts any value containing /, so push_eval_results_to_hub bypasses metadata lookup and tries to resolve/upload with that malformed slug, causing resumed uploads to fail or attach incorrectly. Only reuse the header when it is a real slug, otherwise fall back to metadata resolution.

Useful? React with 👍 / 👎.

mikasenghaas

we get a weird merge of typer cli (prime cli) and pydantic config (vf/ prl). imo broader question if we want to adopt pydantic config everywhere. would prob need some changes for look + feel, and subcommands to work etc. maybe a nicer fusion is possible tho

mikasenghaas · 2026-06-23T19:31:37Z

+            from verifiers.v1.task import WireTask
+            from verifiers.v1.trace import Trace
+
+            trace_type = Trace[WireTask]


i think we acc have a WireTrace type for this built-in iirc

mikasenghaas · 2026-06-23T19:32:31Z

+        main_messages = (
+            [
+                message.model_dump(mode="json", exclude_none=True)
+                for message in branches[-1].messages


should we index with 0 instead of -1?

mikasenghaas · 2026-06-23T19:33:03Z

+                    message.model_dump(mode="json", exclude_none=True)
+                    for message in branch.messages
+                ],
+                "reward": trace.reward,


was this v0 behavior? i thought step reward was None by default?

mikasenghaas · 2026-06-23T19:33:44Z

+                "prompt": [],
+                "completion": main_messages,


i think this is fair, but need to think abt if this will look weird on platform somehow. iirc one table is showing the prompt as a col (we prob shouldnt do this anyways on platform) but yea smth to look out for

mikasenghaas · 2026-06-23T19:35:17Z

+        _print_environment_source_footer(resolved_env)
+        return
+
+    resume_dir = _parse_value_option(passthrough_args, "--resume", None)


hmm getting a bit of smell from this code block. feels like this can be simplified

xeophon added 3 commits June 23, 2026 11:34

add Verifiers v1 CLI support

fa0c9b9

update Verifiers dependency lock

b2d2c80

fast-track Verifiers v1 dependencies

518d3d9

xeophon changed the title ~~[codex] add Verifiers v1 support to Prime CLI~~ add Verifiers v1 support to Prime CLI Jun 23, 2026

xeophon marked this pull request as ready for review June 23, 2026 12:30

xeophon requested review from JannikSt, JohannesHa, burnpiro, d42me, kcoopermiller and willccbb as code owners June 23, 2026 12:30

cursor Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated

fix verifiers eval bridge edge cases

e126a20

cursor Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated

fix v1 resume result upload

d588449

cursor Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated

xeophon added 2 commits June 23, 2026 14:50

handle verifiers legacy flags and env paths

3f9723e

forward configured v1 inference URL

a3d41ab

cursor Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py

Comment thread packages/prime/src/prime_cli/utils/eval_push.py

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

mikasenghaas reviewed Jun 23, 2026

View reviewed changes

xeophon mentioned this pull request Jun 24, 2026

Rewrite Prime CLI around Pydantic configs #760

Draft

Uh oh!

Conversation

xeophon commented Jun 23, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What changed

User impact

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xeophon commented Jun 23, 2026 •

edited by cursor Bot

Loading