Define the Verifiers CLI lifecycle surface#1857
Conversation
…o codex/eval-process-protocol # Conflicts: # verifiers/v1/cli/eval/main.py # verifiers/v1/cli/eval/resolver.py
| continue | ||
| task = trace.task.model_dump(mode="json", exclude_none=True) | ||
| branches = trace.branches | ||
| main_messages = ( |
There was a problem hiding this comment.
🟡 Medium cli/output.py:166
convert_results_for_upload sets "prompt": [] and puts branches[-1].messages (the full root-to-leaf conversation, including the original prompt messages) into "completion". This duplicates the prompt inside completion and loses the prompt/completion split, so consumers that concatenate prompt + completion or gate on both fields being non-empty will misrender or skip native-run transcripts. Consider splitting branches[-1].messages so the initial prompt messages populate "prompt" and the remaining messages populate "completion".
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/cli/output.py around line 166:
`convert_results_for_upload` sets `"prompt": []` and puts `branches[-1].messages` (the full root-to-leaf conversation, including the original prompt messages) into `"completion"`. This duplicates the prompt inside `completion` and loses the prompt/completion split, so consumers that concatenate `prompt + completion` or gate on both fields being non-empty will misrender or skip native-run transcripts. Consider splitting `branches[-1].messages` so the initial prompt messages populate `"prompt"` and the remaining messages populate `"completion"`.
| """Read JSONL results while reporting incomplete or invalid records to the caller.""" | ||
| results: list[dict[str, Any]] = [] | ||
| invalid: list[InvalidResultLine] = [] | ||
| with path.open(encoding="utf-8") as handle: |
There was a problem hiding this comment.
🟡 Medium cli/output.py:82
read_results opens results.jsonl with encoding="utf-8" but does not catch UnicodeDecodeError inside the loop, so a single malformed UTF-8 line (e.g. from an interrupted write of a multibyte character) raises during for line in handle and aborts the entire read. The malformed line is never appended to invalid, so one corrupted line makes the whole run unloadable. Consider opening with encoding="utf-8", errors="replace" (or wrapping the iteration in a try/except UnicodeDecodeError) so partial corruption is reported per line instead of failing the whole load.
| with path.open(encoding="utf-8") as handle: | |
| with path.open(encoding="utf-8", errors="replace") as handle: |
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/cli/output.py around line 82:
`read_results` opens `results.jsonl` with `encoding="utf-8"` but does not catch `UnicodeDecodeError` inside the loop, so a single malformed UTF-8 line (e.g. from an interrupted write of a multibyte character) raises during `for line in handle` and aborts the entire read. The malformed line is never appended to `invalid`, so one corrupted line makes the whole run unloadable. Consider opening with `encoding="utf-8", errors="replace"` (or wrapping the iteration in a `try/except UnicodeDecodeError`) so partial corruption is reported per line instead of failing the whole load.
Overview
This PR makes Verifiers the authoritative local CLI and runtime surface for creating, validating, serving, evaluating, and optimizing environments. Prime-specific acquisition and account management stay in Prime CLI.
High-level changes
eval,init,validate,serve, andgepathroughverifiers.cli.CLI_MODULES.@TOML loading.config.tomlandresults.jsonlas the native evaluation artifacts without an additional manifest.Prime and Verifiers boundary
Prime CLI acquires
owner/name[@version]packages and materializes its selected account as environment variables. Verifiers consumes the installed local package and those explicit runtime credentials; it does not read Prime profile files or perform platform transfers.Verifiers continues to own the configuration and behavior of local evaluation, environment initialization and validation, serving, GEPA, plugin loading, execution, and artifact production.
Public surface
The typed command models, command-module registry, and artifact readers are exported for host applications. Bundled examples, templates, documentation, and Lab guidance use the same command and artifact contracts.
Companion change
Prime CLI integration: PrimeIntellect-ai/prime#760