Skip to content

[ENG-4321] Add raw v1 env CLI support for hosted evals#769

Open
mrmoxon wants to merge 3 commits into
feat/v1-env-hosted-trainingfrom
feat/v1-env-hosted-evals
Open

[ENG-4321] Add raw v1 env CLI support for hosted evals#769
mrmoxon wants to merge 3 commits into
feat/v1-env-hosted-trainingfrom
feat/v1-env-hosted-evals

Conversation

@mrmoxon

@mrmoxon mrmoxon commented Jun 30, 2026

Copy link
Copy Markdown

What This Enables

This lets users submit a hosted eval from raw verifiers v1 config in the Prime CLI.

Before this PR, hosted eval TOML had to point at a published environment:

[[eval]]
env_id = "primeintellect/gsm8k"

After this PR, hosted eval TOML can also use raw v1 selectors:

[[eval]]
taskset = { id = "gsm8k-v1" }
harness = { id = "default" }

Then users can run:

prime eval run my-eval.toml --hosted

What It Builds On

Base branch: feat/v1-env-hosted-training

This builds on the existing v1 env CLI work from hosted training and reuses that shape for hosted eval submission.

The backend side is now consolidated in the platform PR:

  • Platform [ENG-4322, ENG-4323]: backend API accepts raw v1 hosted eval configs and the runner can execute them.

This Prime PR is the CLI half:

  • Prime ENG-4321: CLI accepts raw v1 hosted eval TOML and sends environments.

What Changed

  • Hosted eval TOML accepts either env_id or raw v1 taskset / harness.
  • Validation rejects missing selectors and mixed env_id + v1 selectors.
  • Hosted eval payloads send:
    • environment_ids for the legacy published-env path
    • environments for the raw v1 path
  • Legacy hosted eval configs using env_id continue to work.

Files To Review

  1. packages/prime/src/prime_cli/commands/evals.py

    • TOML parsing, validation, hosted dispatch.
  2. packages/prime/src/prime_cli/utils/hosted_eval.py

    • Hosted eval request payload shape.
  3. packages/prime/tests/test_hosted_eval.py

    • Coverage for legacy env IDs, raw v1 envs, and validation errors.

Out Of Scope

  • No backend execution changes; covered by the platform PR.
  • No hosted training changes.
  • No local eval changes.
  • No UI changes.

Validation

uv run pytest packages/prime/tests/test_hosted_eval.py

Result: 79 passed

@mrmoxon mrmoxon changed the title [draft] Add raw v1 env CLI support for hosted evals [ENG-4321] Add raw v1 env CLI support for hosted evals Jun 30, 2026
@mrmoxon mrmoxon marked this pull request as ready for review June 30, 2026 05:10

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58e659d5e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/prime/src/prime_cli/commands/evals.py Outdated
@mrmoxon mrmoxon mentioned this pull request Jun 30, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Want reviews to match your repository better? Bugbot Learning can learn team-specific rules from PR activity. A team admin can enable Learning in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6353b60. Configure here.

Comment thread packages/prime/src/prime_cli/commands/evals.py
Comment thread packages/prime/src/prime_cli/commands/evals.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants