feat(v1): add Trace.last_reply and adopt it across v1 envs by xeophon · Pull Request #1897 · PrimeIntellect-ai/verifiers

xeophon · 2026-06-30T13:47:05Z

Description

Adds Trace.last_reply — the stripped text of the final sampled assistant message (or "" when there are no assistant messages / the last one has no text) — and adopts it across the v1 envs and docs that read the model's final reply as text for scoring/judging.

Every scoring reward that wants "the answer" hand-rolled the same idiom (assistant_messages[-1].content if assistant_messages else "", or the unguarded [-1].content). last_reply is the typed convenience over that, and safe by default — it returns "" instead of IndexError on an empty trace, and AssistantMessage.content is str | None, so the None case is handled too.

Type of Change

New feature (non-breaking change which adds functionality)
Test improvement
Documentation update

What changed

verifiers/v1/trace.py — new last_reply property (@property, next to assistant_messages): (msgs[-1].content or "").strip() if msgs else "".
tests/v1/test_trace.py — test_last_reply: newest sampled turn wins (" a1 " then "a2" → "a2"), no turns → "", None content → "".
v1 envs migrated to trace.last_reply (each previously hand-rolled the guard): reverse-text-v1, gsm8k-v1, glossary-v1, deepwiki-v1, code-golf-v1, scratchpad-v1, wiki-search-v1.
verifiers/v1/GUIDE.md — the canonical reverse-text reward and two other snippets now use trace.last_reply.
Not migrated (intentionally): alphabet-sort-v1 iterates every assistant turn ([m.content or "" for m in trace.assistant_messages]); color-codeword-v1 takes the AssistantMessage objects themselves. Neither is "last reply" semantics, so they stay on assistant_messages.

Testing

New tests have been added to cover the changes — tests/v1/test_trace.py::test_last_reply (passes; full test_trace.py green, 4/4).
All migrated env tasksets import against the patched source (syntactically + semantically valid against the new Trace.last_reply).
ruff check clean on all changed files.
The @pytest.mark.e2e env runs (real model rollouts + runtimes) are not executed locally; the change is a read-only property with no new imports and is covered by the unit test + import check.

Checklist

My code follows the style guidelines (AGENTS.md)
I have performed a self-review of my own code
I have commented my code (docstring on the property; inline rationale in the test)
I have made corresponding changes to the documentation (GUIDE.md)
My changes generate no new warnings (ruff clean)

Additional Notes

Companion to the Prime CLI v1 work (PrimeIntellect-ai/prime#766). last_reply is purely additive; the env swaps are behavior-preserving (the only behavioral change is an outer .strip() on the final reply, which is benign for the substring/regex/judge checks these rewards use — and matches what the GUIDE's reverse-text example already did).

Note

Low Risk
Additive trace helper plus mechanical reward refactors; only scoring input normalization changes (strip / empty-string fallback), with no auth, runtime, or wire-format changes.

Overview
Introduces Trace.last_reply on verifiers/v1/trace.py as the canonical way to read the model’s final sampled assistant text: whitespace-stripped content, or "" when there are no assistant messages or content is None.

Seven v1 tasksets (code-golf, deepwiki, glossary, gsm8k, reverse-text, scratchpad, wiki-search) and verifiers/v1/GUIDE.md examples now use trace.last_reply instead of hand-rolled assistant_messages[-1].content guards.

Scoring and stop hooks therefore get normalized, safe final-reply text (including an outer .strip()); empty traces no longer risk IndexError on [-1].

^{Reviewed by Cursor Bugbot for commit d76f20a. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add `Trace.last_reply` property and adopt it across v1 environments

Adds Trace.last_reply to trace.py as a convenience property returning the last assistant message content, whitespace-stripped, or an empty string when no assistant messages exist.
Replaces manual trace.assistant_messages[-1].content access in reward functions across all v1 environments (code_golf, deepwiki, glossary, gsm8k, reverse_text, scratchpad, wiki_search) with trace.last_reply.
Updates the GUIDE.md examples to reflect the new pattern.

^{Macroscope summarized d76f20a.}

macroscopeapp · 2026-06-30T13:48:18Z

Approvability

Verdict: Approved

Adds a convenience property last_reply to consolidate repeated code patterns across v1 environments. The changes are mechanical adoptions of the new property. The only behavioral difference is adding .strip() to whitespace-trim content, which is appropriate for scoring contexts.

^{You can customize Macroscope's approvability policy. Learn more.}

Add a `last_reply` property on `vf.Trace`: the stripped text of the final sampled assistant message, or `""` when there are no assistant messages or the last one has no text. A typed convenience over the `(assistant_messages[-1].content or "").strip()` idiom every scoring reward repeats, and safe by default (returns "" instead of IndexError on an empty trace; AssistantMessage.content is `str | None`). Adopt `trace.last_reply` in the v1 envs and docs that read the model's final reply as text for scoring/judging — reverse-text, gsm8k, glossary, deepwiki, code-golf, scratchpad, wiki-search, and the GUIDE snippets. Each previously hand-rolled `assistant_messages[-1].content if assistant_messages else ""` (or the unguarded `[-1].content`), so the swap also removes the empty-trace footgun. Envs that need the full message list (alphabet-sort iterates every turn; color-codeword takes the message objects) are left on `assistant_messages`. Verified: env tasksets import against the patched source; ruff clean. Amp-Thread-ID: https://ampcode.com/threads/T-019f130a-103e-72fc-8182-e1a43db7d403

macroscopeapp Bot previously approved these changes Jun 30, 2026

View reviewed changes

xeophon dismissed macroscopeapp[bot]’s stale review via d76f20a June 30, 2026 13:48

xeophon force-pushed the feat/v1-trace-last-reply branch from a5ed55f to d76f20a Compare June 30, 2026 13:48

macroscopeapp Bot approved these changes Jun 30, 2026

View reviewed changes

xeophon merged commit 135121c into main Jun 30, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(v1): add Trace.last_reply and adopt it across v1 envs#1897

feat(v1): add Trace.last_reply and adopt it across v1 envs#1897
xeophon merged 1 commit into
mainfrom
feat/v1-trace-last-reply

xeophon commented Jun 30, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

xeophon commented Jun 30, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

What changed

Testing

Checklist

Additional Notes

Add Trace.last_reply property and adopt it across v1 environments

Uh oh!

macroscopeapp Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xeophon commented Jun 30, 2026 •

edited by macroscopeapp Bot

Loading

Add `Trace.last_reply` property and adopt it across v1 environments

macroscopeapp Bot commented Jun 30, 2026 •

edited

Loading