Skip to content

feat(v1): add Trace.last_reply and adopt it across v1 envs#1897

Merged
xeophon merged 1 commit into
mainfrom
feat/v1-trace-last-reply
Jun 30, 2026
Merged

feat(v1): add Trace.last_reply and adopt it across v1 envs#1897
xeophon merged 1 commit into
mainfrom
feat/v1-trace-last-reply

Conversation

@xeophon

@xeophon xeophon commented Jun 30, 2026

Copy link
Copy Markdown
Member

Description

Adds Trace.last_reply — the stripped text of the final sampled assistant message (or "" when there are no assistant messages / the last one has no text) — and adopts it across the v1 envs and docs that read the model's final reply as text for scoring/judging.

Every scoring reward that wants "the answer" hand-rolled the same idiom (assistant_messages[-1].content if assistant_messages else "", or the unguarded [-1].content). last_reply is the typed convenience over that, and safe by default — it returns "" instead of IndexError on an empty trace, and AssistantMessage.content is str | None, so the None case is handled too.

Type of Change

  • New feature (non-breaking change which adds functionality)
  • Test improvement
  • Documentation update

What changed

  • verifiers/v1/trace.py — new last_reply property (@property, next to assistant_messages): (msgs[-1].content or "").strip() if msgs else "".
  • tests/v1/test_trace.pytest_last_reply: newest sampled turn wins (" a1 " then "a2""a2"), no turns → "", None content → "".
  • v1 envs migrated to trace.last_reply (each previously hand-rolled the guard): reverse-text-v1, gsm8k-v1, glossary-v1, deepwiki-v1, code-golf-v1, scratchpad-v1, wiki-search-v1.
  • verifiers/v1/GUIDE.md — the canonical reverse-text reward and two other snippets now use trace.last_reply.
  • Not migrated (intentionally): alphabet-sort-v1 iterates every assistant turn ([m.content or "" for m in trace.assistant_messages]); color-codeword-v1 takes the AssistantMessage objects themselves. Neither is "last reply" semantics, so they stay on assistant_messages.

Testing

  • New tests have been added to cover the changes — tests/v1/test_trace.py::test_last_reply (passes; full test_trace.py green, 4/4).
  • All migrated env tasksets import against the patched source (syntactically + semantically valid against the new Trace.last_reply).
  • ruff check clean on all changed files.
  • The @pytest.mark.e2e env runs (real model rollouts + runtimes) are not executed locally; the change is a read-only property with no new imports and is covered by the unit test + import check.

Checklist

  • My code follows the style guidelines (AGENTS.md)
  • I have performed a self-review of my own code
  • I have commented my code (docstring on the property; inline rationale in the test)
  • I have made corresponding changes to the documentation (GUIDE.md)
  • My changes generate no new warnings (ruff clean)

Additional Notes

Companion to the Prime CLI v1 work (PrimeIntellect-ai/prime#766). last_reply is purely additive; the env swaps are behavior-preserving (the only behavioral change is an outer .strip() on the final reply, which is benign for the substring/regex/judge checks these rewards use — and matches what the GUIDE's reverse-text example already did).


Note

Low Risk
Additive trace helper plus mechanical reward refactors; only scoring input normalization changes (strip / empty-string fallback), with no auth, runtime, or wire-format changes.

Overview
Introduces Trace.last_reply on verifiers/v1/trace.py as the canonical way to read the model’s final sampled assistant text: whitespace-stripped content, or "" when there are no assistant messages or content is None.

Seven v1 tasksets (code-golf, deepwiki, glossary, gsm8k, reverse-text, scratchpad, wiki-search) and verifiers/v1/GUIDE.md examples now use trace.last_reply instead of hand-rolled assistant_messages[-1].content guards.

Scoring and stop hooks therefore get normalized, safe final-reply text (including an outer .strip()); empty traces no longer risk IndexError on [-1].

Reviewed by Cursor Bugbot for commit d76f20a. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add Trace.last_reply property and adopt it across v1 environments

  • Adds Trace.last_reply to trace.py as a convenience property returning the last assistant message content, whitespace-stripped, or an empty string when no assistant messages exist.
  • Replaces manual trace.assistant_messages[-1].content access in reward functions across all v1 environments (code_golf, deepwiki, glossary, gsm8k, reverse_text, scratchpad, wiki_search) with trace.last_reply.
  • Updates the GUIDE.md examples to reflect the new pattern.

Macroscope summarized d76f20a.

macroscopeapp[bot]
macroscopeapp Bot previously approved these changes Jun 30, 2026
@macroscopeapp

macroscopeapp Bot commented Jun 30, 2026

Copy link
Copy Markdown

Approvability

Verdict: Approved

Adds a convenience property last_reply to consolidate repeated code patterns across v1 environments. The changes are mechanical adoptions of the new property. The only behavioral difference is adding .strip() to whitespace-trim content, which is appropriate for scoring contexts.

You can customize Macroscope's approvability policy. Learn more.

Add a `last_reply` property on `vf.Trace`: the stripped text of the final
sampled assistant message, or `""` when there are no assistant messages or
the last one has no text. A typed convenience over the
`(assistant_messages[-1].content or "").strip()` idiom every scoring reward
repeats, and safe by default (returns "" instead of IndexError on an empty
trace; AssistantMessage.content is `str | None`).

Adopt `trace.last_reply` in the v1 envs and docs that read the model's final
reply as text for scoring/judging — reverse-text, gsm8k, glossary, deepwiki,
code-golf, scratchpad, wiki-search, and the GUIDE snippets. Each previously
hand-rolled `assistant_messages[-1].content if assistant_messages else ""`
(or the unguarded `[-1].content`), so the swap also removes the empty-trace
footgun. Envs that need the full message list (alphabet-sort iterates every
turn; color-codeword takes the message objects) are left on `assistant_messages`.

Verified: env tasksets import against the patched source; ruff clean.

Amp-Thread-ID: https://ampcode.com/threads/T-019f130a-103e-72fc-8182-e1a43db7d403
@xeophon xeophon merged commit 135121c into main Jun 30, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant