Skip to content

chore: drop rlm_ prefix from rlm harness metrics#1911

Merged
mikasenghaas merged 1 commit into
mainfrom
chore/drop-rlm-metrics-prefix
Jul 2, 2026
Merged

chore: drop rlm_ prefix from rlm harness metrics#1911
mikasenghaas merged 1 commit into
mainfrom
chore/drop-rlm-metrics-prefix

Conversation

@mikasenghaas

@mikasenghaas mikasenghaas commented Jul 1, 2026

Copy link
Copy Markdown
Member

Summary

Surface rlm's meta.json metrics under their own names instead of prefixing them with rlm_ (verifiers/v1/harnesses/rlm/harness.py). The metric keys are already rlm-specific (num_compactions, ipython_input_*_mean, num_ptc_calls, ...), so the prefix was redundant.

Pairs with PrimeIntellect-ai/rlm-harness#110, which prunes the rlm metric set down to only what verifiers v1 can't derive natively (compaction volume, ipython input size, programmatic tool calls) and renames the survivors.

Breaking

  • rlm harness metrics now appear on the trace as e.g. num_compactions instead of rlm_num_compactions. Any dashboard/query keyed on rlm_* for rlm runs must drop the prefix.

Verification

Ran terminal-bench-2-v1 fix-git on the prime runtime with rlm installed from the rlm-harness#110 branch (--harness.version chore/prune-redundant-metrics) and this prefix drop active.

  • Task solved (solved: 1.0, agent_completed, 15 turns).
  • Metrics logged on the trace (bare names — prefix drop confirmed):
    num_compactions, turns_since_last_compaction, turns_between_compactions_mean, compaction_chars_dropped_mean, compaction_summary_chars_mean, ipython_input_chars_mean (81.5), ipython_input_loc_mean (2.2), num_ptc_calls, num_ptc_calls_bash.
  • The removed rlm metrics are covered natively by the v1 trace on the same rollout: num_branches=1, num_turns=15, completion_tokens=2248 (sum of node usage).

Note

Drop rlm_ prefix from RLM harness metric keys

In harness.py, the RLMHarness.rlm coroutine now returns metrics using the raw keys from meta['metrics'] instead of prefixing each key with rlm_. Behavioral Change: any consumer reading prefixed keys like rlm_accuracy must update to use the unprefixed form (e.g. accuracy).

Macroscope summarized c396dd6.


Note

Medium Risk
Renaming trace metric keys is a breaking change for any consumer keyed on rlm_*, though runtime behavior is unchanged.

Overview
RLM harness metrics now use the same keys as meta.json (num_compactions, ipython_input_chars_mean, etc.) instead of prefixing every value with rlm_.

The RLMHarness.rlm metric coroutine only changes how keys are named when numeric fields from sessions/*/meta.json are merged onto the trace; filtering of non-numeric metrics is unchanged. Comments were updated to match the slimmer rlm metric set (compactions, ipython input size, programmatic tool calls).

Breaking: dashboards or queries that expect rlm_* keys must switch to the unprefixed names. This pairs with rlm-harness pruning redundant metrics that v1 already derives from the trace.

Reviewed by Cursor Bugbot for commit c396dd6. Bugbot is set up for automated code reviews on this repo. Configure here.

Surface rlm's meta.json metrics under their own names instead of an
`rlm_` prefix. The metric keys (num_compactions, ipython_input_*_mean,
num_ptc_calls, ...) are already rlm-specific, so the prefix was redundant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread verifiers/v1/harnesses/rlm/harness.py
@mikasenghaas mikasenghaas requested a review from snimu July 1, 2026 22:30
@mikasenghaas mikasenghaas marked this pull request as ready for review July 2, 2026 00:18
@mikasenghaas mikasenghaas merged commit 101827c into main Jul 2, 2026
12 checks passed
@macroscopeapp

macroscopeapp Bot commented Jul 2, 2026

Copy link
Copy Markdown

Approvability

Verdict: Approved

This is a mechanical rename that drops the 'rlm_' prefix from metric keys. The underlying metric collection logic is unchanged - only the key naming format is simplified.

You can customize Macroscope's approvability policy. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants