chore: drop rlm_ prefix from rlm harness metrics#1911
Merged
Conversation
Surface rlm's meta.json metrics under their own names instead of an `rlm_` prefix. The metric keys (num_compactions, ipython_input_*_mean, num_ptc_calls, ...) are already rlm-specific, so the prefix was redundant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
snimu
approved these changes
Jul 1, 2026
ApprovabilityVerdict: Approved This is a mechanical rename that drops the 'rlm_' prefix from metric keys. The underlying metric collection logic is unchanged - only the key naming format is simplified. You can customize Macroscope's approvability policy. Learn more. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Surface rlm's
meta.jsonmetrics under their own names instead of prefixing them withrlm_(verifiers/v1/harnesses/rlm/harness.py). The metric keys are already rlm-specific (num_compactions,ipython_input_*_mean,num_ptc_calls, ...), so the prefix was redundant.Pairs with PrimeIntellect-ai/rlm-harness#110, which prunes the rlm metric set down to only what verifiers v1 can't derive natively (compaction volume, ipython input size, programmatic tool calls) and renames the survivors.
Breaking
num_compactionsinstead ofrlm_num_compactions. Any dashboard/query keyed onrlm_*for rlm runs must drop the prefix.Verification
Ran
terminal-bench-2-v1fix-giton theprimeruntime with rlm installed from the rlm-harness#110 branch (--harness.version chore/prune-redundant-metrics) and this prefix drop active.solved: 1.0,agent_completed, 15 turns).num_compactions,turns_since_last_compaction,turns_between_compactions_mean,compaction_chars_dropped_mean,compaction_summary_chars_mean,ipython_input_chars_mean(81.5),ipython_input_loc_mean(2.2),num_ptc_calls,num_ptc_calls_bash.num_branches=1,num_turns=15,completion_tokens=2248(sum of node usage).Note
Drop
rlm_prefix from RLM harness metric keysIn harness.py, the
RLMHarness.rlmcoroutine now returns metrics using the raw keys frommeta['metrics']instead of prefixing each key withrlm_. Behavioral Change: any consumer reading prefixed keys likerlm_accuracymust update to use the unprefixed form (e.g.accuracy).Macroscope summarized c396dd6.
Note
Medium Risk
Renaming trace metric keys is a breaking change for any consumer keyed on
rlm_*, though runtime behavior is unchanged.Overview
RLM harness metrics now use the same keys as
meta.json(num_compactions,ipython_input_chars_mean, etc.) instead of prefixing every value withrlm_.The
RLMHarness.rlmmetric coroutine only changes how keys are named when numeric fields fromsessions/*/meta.jsonare merged onto the trace; filtering of non-numeric metrics is unchanged. Comments were updated to match the slimmer rlm metric set (compactions, ipython input size, programmatic tool calls).Breaking: dashboards or queries that expect
rlm_*keys must switch to the unprefixed names. This pairs with rlm-harness pruning redundant metrics that v1 already derives from the trace.Reviewed by Cursor Bugbot for commit c396dd6. Bugbot is set up for automated code reviews on this repo. Configure here.