Skip to content

add Verifiers v1 support to Prime CLI#758

Open
xeophon wants to merge 7 commits into
mainfrom
codex/verifiers-v1-cli
Open

add Verifiers v1 support to Prime CLI#758
xeophon wants to merge 7 commits into
mainfrom
codex/verifiers-v1-cli

Conversation

@xeophon

@xeophon xeophon commented Jun 23, 2026

Copy link
Copy Markdown
Member

Overview

Adds Verifiers v1 support to the Prime CLI while preserving compatibility with legacy and hosted evaluation flows.

What changed

  • routes local prime eval run and prime env init through the Verifiers v1 entrypoints
  • supports v1 taskset TOML configs, resume and dry-run flows, client headers, output directories, and trace conversion for Evals uploads
  • keeps generated hosted commands using --save-results on the legacy v0 evaluator
  • requires verifiers>=0.1.15.dev371 from PyPI instead of a Git source
  • regenerates the workspace lockfile for the Verifiers v1 dependency graph while retaining the repository's exclude-newer policy
  • keeps CI on the standard uv sync and uv run workflow for Python 3.11–3.13

User impact

Local evaluations use the Verifiers v1 CLI by default. Legacy and hosted-generated commands remain compatible with the existing hosted runner, while local v1 results can be uploaded through the Prime Evals flow.


Note

Medium Risk
Large changes to the default eval execution and upload path affect most local runs; legacy and hosted paths are retained but users on Python 3.10 or unpinned older verifiers will break.

Overview
Local prime eval run and prime env init now go through Verifiers v1 (verifiers.v1.cli.*), with plugin invocation using v1 console-style entrypoints and help text rewritten for prime commands.

v1 eval flow adds taskset TOML (@config), --resume, --dry-run, --client.base-url / --client.headers, default output dirs, post-run metadata.json, and convert_eval_results so v1 traces upload to Prime Evals while legacy results.jsonl still works. --save-results / -s keeps the v0 evaluator (hosted-generated commands). CLI --sampling-args is rejected for v1 unless legacy save mode; v1 uses --sampling.*.

Dependencies & platform: verifiers>=0.1.15.dev371 from PyPI (git pin removed), Python >=3.11,<3.14, CI/docker matrices 3.11–3.13 only. Root lock/trust metadata updated for the v1 dependency graph.

Reviewed by Cursor Bugbot for commit a3d41ab. Bugbot is set up for automated code reviews on this repo. Configure here.

@xeophon xeophon changed the title [codex] add Verifiers v1 support to Prime CLI add Verifiers v1 support to Prime CLI Jun 23, 2026
@xeophon xeophon marked this pull request as ready for review June 23, 2026 12:30
Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated
Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated
Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 518d3d9d3a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated
Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated
Comment thread packages/prime/src/prime_cli/verifiers_bridge.py
Comment thread packages/prime/src/prime_cli/verifiers_bridge.py
Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a3d41ab. Configure here.

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py
Comment thread packages/prime/src/prime_cli/utils/eval_push.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3d41abef6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

saved_headers = saved_config.get("client", {}).get("headers", {})
job_id = saved_headers.get("X-PI-Job-Id") or _build_job_id(job_target, model)
display_id = saved_headers.get(INTERNAL_ENV_DISPLAY_HEADER)
upstream_slug = display_id if isinstance(display_id, str) and "/" in display_id else None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate display headers before reusing them as slugs

When resuming a run saved from a local environment that was ahead of its upstream, INTERNAL_ENV_DISPLAY_HEADER can contain a display string such as wiki-search (local - ahead of primeintellect/wiki-search), not an owner/name slug. This check accepts any value containing /, so push_eval_results_to_hub bypasses metadata lookup and tries to resolve/upload with that malformed slug, causing resumed uploads to fail or attach incorrectly. Only reuse the header when it is a real slug, otherwise fall back to metadata resolution.

Useful? React with 👍 / 👎.

@mikasenghaas mikasenghaas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2026-06-23 at 12 35 49 PM

we get a weird merge of typer cli (prime cli) and pydantic config (vf/ prl). imo broader question if we want to adopt pydantic config everywhere. would prob need some changes for look + feel, and subcommands to work etc. maybe a nicer fusion is possible tho

from verifiers.v1.task import WireTask
from verifiers.v1.trace import Trace

trace_type = Trace[WireTask]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we acc have a WireTrace type for this built-in iirc

main_messages = (
[
message.model_dump(mode="json", exclude_none=True)
for message in branches[-1].messages

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we index with 0 instead of -1?

message.model_dump(mode="json", exclude_none=True)
for message in branch.messages
],
"reward": trace.reward,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this v0 behavior? i thought step reward was None by default?

Comment on lines +115 to +116
"prompt": [],
"completion": main_messages,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is fair, but need to think abt if this will look weird on platform somehow. iirc one table is showing the prompt as a col (we prob shouldnt do this anyways on platform) but yea smth to look out for

_print_environment_source_footer(resolved_env)
return

resume_dir = _parse_value_option(passthrough_args, "--resume", None)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm getting a bit of smell from this code block. feels like this can be simplified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants