Skip to content

Define the Verifiers CLI lifecycle surface#1857

Draft
xeophon wants to merge 16 commits into
mainfrom
codex/eval-process-protocol
Draft

Define the Verifiers CLI lifecycle surface#1857
xeophon wants to merge 16 commits into
mainfrom
codex/eval-process-protocol

Conversation

@xeophon

@xeophon xeophon commented Jun 24, 2026

Copy link
Copy Markdown
Member

Overview

This PR makes Verifiers the authoritative local CLI and runtime surface for creating, validating, serving, evaluating, and optimizing environments. Prime-specific acquisition and account management stay in Prime CLI.

High-level changes

  • Exports eval, init, validate, serve, and gepa through verifiers.cli.CLI_MODULES.
  • Uses strict Pydantic configuration for the complete local lifecycle, including native @ TOML loading.
  • Makes V1 tasksets the default scaffold while retaining explicit V0 creation, evaluation, and serving.
  • Supports typed GEPA configuration for one or multiple local environments.
  • Restricts taskset, harness, and V0 environment IDs to built-in or locally importable packages.
  • Removes Prime profile loading, Hub reference resolution, downloads, caching, and installation from Verifiers.
  • Keeps config.toml and results.jsonl as the native evaluation artifacts without an additional manifest.

Prime and Verifiers boundary

Prime CLI acquires owner/name[@version] packages and materializes its selected account as environment variables. Verifiers consumes the installed local package and those explicit runtime credentials; it does not read Prime profile files or perform platform transfers.

Verifiers continues to own the configuration and behavior of local evaluation, environment initialization and validation, serving, GEPA, plugin loading, execution, and artifact production.

Public surface

The typed command models, command-module registry, and artifact readers are exported for host applications. Bundled examples, templates, documentation, and Lab guidance use the same command and artifact contracts.

Companion change

Prime CLI integration: PrimeIntellect-ai/prime#760

Comment thread verifiers/v1/cli/eval/main.py Outdated
Comment thread verifiers/v1/cli/eval/resolver.py Outdated
Comment thread verifiers/v1/cli/eval/resolver.py Outdated
xeophon added 2 commits June 25, 2026 15:17
…o codex/eval-process-protocol

# Conflicts:
#	verifiers/v1/cli/eval/main.py
#	verifiers/v1/cli/eval/resolver.py
@xeophon xeophon changed the title [codex] add versioned eval process protocol Add versioned eval process protocol Jun 25, 2026
Comment thread verifiers/v1/cli/eval/main.py Outdated
@xeophon xeophon changed the title Add versioned eval process protocol Simplify the V1 eval CLI for host delegation Jun 26, 2026
Comment thread verifiers/cli/plugins/prime.py Outdated
@xeophon xeophon changed the title Simplify the V1 eval CLI for host delegation Define the Verifiers CLI lifecycle surface Jun 26, 2026
Comment thread verifiers/utils/install_utils.py Outdated
Comment thread verifiers/v1/utils/install.py Outdated
Comment thread verifiers/utils/client_utils.py Outdated
Comment thread verifiers/utils/install_utils.py Outdated
Comment thread verifiers/gepa/config.py
Comment thread verifiers/scripts/gepa.py
Comment thread verifiers/utils/install_utils.py Outdated
Comment thread verifiers/utils/install_utils.py Outdated
Comment thread verifiers/v1/utils/install.py Outdated
Comment thread docs/overview.md Outdated
Comment thread verifiers/v1/cli/output.py Outdated
continue
task = trace.task.model_dump(mode="json", exclude_none=True)
branches = trace.branches
main_messages = (

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium cli/output.py:166

convert_results_for_upload sets "prompt": [] and puts branches[-1].messages (the full root-to-leaf conversation, including the original prompt messages) into "completion". This duplicates the prompt inside completion and loses the prompt/completion split, so consumers that concatenate prompt + completion or gate on both fields being non-empty will misrender or skip native-run transcripts. Consider splitting branches[-1].messages so the initial prompt messages populate "prompt" and the remaining messages populate "completion".

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/cli/output.py around line 166:

`convert_results_for_upload` sets `"prompt": []` and puts `branches[-1].messages` (the full root-to-leaf conversation, including the original prompt messages) into `"completion"`. This duplicates the prompt inside `completion` and loses the prompt/completion split, so consumers that concatenate `prompt + completion` or gate on both fields being non-empty will misrender or skip native-run transcripts. Consider splitting `branches[-1].messages` so the initial prompt messages populate `"prompt"` and the remaining messages populate `"completion"`.

"""Read JSONL results while reporting incomplete or invalid records to the caller."""
results: list[dict[str, Any]] = []
invalid: list[InvalidResultLine] = []
with path.open(encoding="utf-8") as handle:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium cli/output.py:82

read_results opens results.jsonl with encoding="utf-8" but does not catch UnicodeDecodeError inside the loop, so a single malformed UTF-8 line (e.g. from an interrupted write of a multibyte character) raises during for line in handle and aborts the entire read. The malformed line is never appended to invalid, so one corrupted line makes the whole run unloadable. Consider opening with encoding="utf-8", errors="replace" (or wrapping the iteration in a try/except UnicodeDecodeError) so partial corruption is reported per line instead of failing the whole load.

Suggested change
with path.open(encoding="utf-8") as handle:
with path.open(encoding="utf-8", errors="replace") as handle:
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/cli/output.py around line 82:

`read_results` opens `results.jsonl` with `encoding="utf-8"` but does not catch `UnicodeDecodeError` inside the loop, so a single malformed UTF-8 line (e.g. from an interrupted write of a multibyte character) raises during `for line in handle` and aborts the entire read. The malformed line is never appended to `invalid`, so one corrupted line makes the whole run unloadable. Consider opening with `encoding="utf-8", errors="replace"` (or wrapping the iteration in a `try/except UnicodeDecodeError`) so partial corruption is reported per line instead of failing the whole load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant