Fix duplicate column names from summarise_scores() with empty metrics (#1179) by nikosbosse · Pull Request #1180 · epiforecasts/scoringutils

nikosbosse · 2026-05-28T14:33:18Z

CLAUDE: Closes #1179.

Summary

summarise_scores() previously selected which columns to summarise via colnames(scores) %like% paste(metrics, collapse = "|"). When the metrics attribute is character(0) (i.e. every metric in score() warned and returned nothing), this pattern becomes the empty string, which %like% matches against every column — including the by column. The by column was then passed to the summary function, producing the spurious "argument is not numeric or logical" warning and a data.table with a duplicate by column. The duplicate is invisible inside data.table but breaks downstream conversion to tibble.
Switched to exact column-name matching via intersect(colnames(scores), metrics). This also incidentally fixes a latent issue where a metric named e.g. "wis" would have matched any column whose name contained "wis" (such as "wis_relative_skill").

Both tightenings from #1179 are implemented

The issue suggested two fixes (either or both); this PR does both:

summarise_scores() no longer summarises its by columns. With exact-name matching, .SDcols is now intersect(colnames(scores), metrics), so task-ID columns (including the by column) are never passed to the summary function. This removes the duplicate-column root cause.
summarise_scores() errors early when there are no score columns to summarise. When intersect(colnames(scores), metrics) is empty there is nothing meaningful to return, so it now aborts with a clear message rather than producing a malformed object.

Tests

End-to-end regression test for the issue reprex: scoring example_quantile with only interval_coverage_55 warns and produces no score columns, after which summarise_scores() errors.
A unit-level test of the same empty-metrics case (manually cleared score columns).
A test guarding against partial-name matching.

Out of scope

The issue also raises whether score() should itself fail (rather than return an empty scores object) when every metric fails. That's a real question but a bigger design call; leaving it for a separate discussion.

Test plan

Targeted tests pass locally (testthat::test_file("tests/testthat/test-summarise_scores.R") — 16 pass)
lintr::lint() clean on changed files
CI green (full R CMD check / covr relied on CI)
Branch rebased onto current main, which also picks up the multivariate-sample snapshot fix (Fix macOS CI snapshot precision + lint-changed-files workflow #1182) so macOS CI should be green.

This was opened by a bot. Please ping @seabbs for any questions.

codecov · 2026-05-28T14:37:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.99%. Comparing base (abd733e) to head (d124e69).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1180   +/-   ##
=======================================
  Coverage   97.98%   97.99%           
=======================================
  Files          38       38           
  Lines        2036     2045    +9     
=======================================
+ Hits         1995     2004    +9     
  Misses         41       41

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

nikosbosse · 2026-05-28T17:26:19Z

CLAUDE: All checks pass except macOS-latest (release), which fails on the same two tests at the same line numbers (test-class-forecast-multivariate-sample.R:343 and :376 in score.forecast_multivariate_sample() creates expected output) on main as well (e.g. run 25738222052). This is pre-existing macOS-specific breakage in the multivariate sample tests, unrelated to this PR.

seabbs · 2026-05-29T08:36:07Z

#1179 contains a nice reprex test case can we test against it. I wasn't clear if this was from you @nikosbosse or claude. I guess our practice for review is to wait to be tagged but maybe making it clearer in the PR description would be good:?

seabbs · 2026-05-29T08:37:25Z

This looks good to go otherwise though it would be nice to explicitly state that both the tightenings suggested in #1179 have been implemented (it looks to me that they have)

seabbs-bot · 2026-05-29T09:27:27Z

Addressed the review feedback:

Reprex test: added an end-to-end regression test that scores `example_quantile` with only `interval_coverage_55` (the exact reprex from summarize_scores() produces duplicate column names when input has no score columns #1179) — it warns, produces no score columns, and `summarise_scores()` then errors. Kept the unit-level empty-metrics test alongside it.
Both tightenings from summarize_scores() produces duplicate column names when input has no score columns #1179: confirmed both are implemented and now spelled out in the PR description — (1) `by` columns are no longer summarised (exact-name `.SDcols` matching), and (2) early error when there are no score columns.
Up to date with main: rebased onto current `main`, which also brings in the multivariate-sample snapshot fix (Fix macOS CI snapshot precision + lint-changed-files workflow #1182), so the previously failing macOS tests should now pass.

This was opened by a bot. Please ping @seabbs for any questions.

seabbs-bot · 2026-05-29T09:27:56Z

Automated review pass (agent quality gate)

No Critical or Important findings. The change is correct and well-scoped.

Observations:

The fix replaces a fragile regex partial-match (`%like% paste(metrics, collapse = "|")`) with exact-name matching (`intersect(colnames(scores), metrics)`), which also fixes the latent partial-match bug (e.g. "wis" matching "wis_relative_skill"). Good.
Turning the previously-silent duplicate-column case into an explicit `cli_abort` is the right call and matches the issue intent.
Line 82 sets `attr(scores, "metrics") <- metrics` (full vector) rather than `metric_cols`. This preserves prior behaviour and is reachable only when `metric_cols` is non-empty, so it is fine; just noting it is intentional.
`cli` and `cli_abort` are already package deps/imports; the multi-line message string matches the existing codebase convention (cli collapses internal whitespace on render).
Regression tests cover the empty-metrics error, the end-to-end reprex from summarize_scores() produces duplicate column names when input has no score columns #1179, and the partial-match guard. `scores_quantile` fixture confirmed present in setup.R.
`lintr::lint()` clean on both changed files locally.

CI is pending; will continue to monitor checks and mergeability.

This was opened by a bot. Please ping @seabbs for any questions.

seabbs

@seabbs-bot added the explicit reprex so this looks good to me now.

`summarise_scores()` selected the columns to summarise via `colnames(scores) %like% paste(metrics, collapse = "|")`. When the `metrics` attribute is empty (which happens when every metric passed to `score()` warned and returned nothing), the pattern becomes the empty string, which `%like%` matches against every column. The `by` column was then passed to the summary function, producing a duplicate `by` column in the output and the spurious "argument is not numeric or logical" warning. Switch to exact column-name matching via `intersect()` and error early when there is nothing to summarise. This also incidentally fixes a latent issue where a metric named e.g. "wis" would have matched any column whose name contained "wis" (such as "wis_relative_skill"). Closes #1179. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a regression test exercising the exact reprex from #1179: scoring example_quantile with only `interval_coverage_55` warns and produces no score columns, after which `summarise_scores()` must error rather than return a data.table with a duplicate `by` column.

nikosbosse · 2026-05-30T09:39:37Z

Sweet, thank you! Yes haven't tagged you yet because I hadn't thought about it deeply enough myself. Looks good to me

Replaces the local dedup-before-as_tibble workaround for the scoringutils duplicate-column bug with an eager abort that mirrors the upstream fix (epiforecasts/scoringutils#1180). The dedup is no longer needed: the bug only manifested when `summarise_scores()` was called with an empty metrics attribute, and the new abort fires before that call. Verified against CRAN scoringutils 2.2.0 that the buggy code path is never reached. Wording and condition match the upstream implementation exactly so user-facing behaviour will not shift once the scoringutils minimum version is bumped and the mirrored block is deleted. Two tests that previously expected only the upstream scoringutils warning now expect the new abort message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

seabbs self-requested a review May 29, 2026 08:24

seabbs-bot force-pushed the fix/summarise-scores-empty-metrics-1179 branch from 466b46e to 487fcc2 Compare May 29, 2026 09:26

seabbs-bot force-pushed the fix/summarise-scores-empty-metrics-1179 branch from 487fcc2 to 409145b Compare May 29, 2026 09:32

seabbs approved these changes May 29, 2026

View reviewed changes

seabbs enabled auto-merge May 29, 2026 09:32

nikosbosse and others added 2 commits May 29, 2026 10:35

seabbs-bot force-pushed the fix/summarise-scores-empty-metrics-1179 branch from 409145b to d124e69 Compare May 29, 2026 09:36

nikosbosse disabled auto-merge May 30, 2026 09:40

nikosbosse merged commit 0e15277 into main May 30, 2026
11 checks passed

nikosbosse deleted the fix/summarise-scores-empty-metrics-1179 branch May 30, 2026 09:40

This was referenced Jun 1, 2026

Remove scoringutils#1180 workaround mirror once upstream fix is on CRAN hubverse-org/hubEvals#118

Open

CRAN prep: tibble return + oracle_output_id fix (#70, #73) hubverse-org/hubEvals#117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicate column names from summarise_scores() with empty metrics (#1179)#1180

Fix duplicate column names from summarise_scores() with empty metrics (#1179)#1180
nikosbosse merged 2 commits into
mainfrom
fix/summarise-scores-empty-metrics-1179

nikosbosse commented May 28, 2026 •

edited by seabbs-bot

Loading

Uh oh!

codecov Bot commented May 28, 2026 •

edited

Loading

Uh oh!

nikosbosse commented May 28, 2026

Uh oh!

seabbs commented May 29, 2026

Uh oh!

seabbs commented May 29, 2026

Uh oh!

seabbs-bot commented May 29, 2026

Uh oh!

seabbs-bot commented May 29, 2026

Uh oh!

seabbs left a comment

Uh oh!

nikosbosse commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nikosbosse commented May 28, 2026 • edited by seabbs-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Both tightenings from #1179 are implemented

Tests

Out of scope

Test plan

Uh oh!

codecov Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nikosbosse commented May 28, 2026

Uh oh!

seabbs commented May 29, 2026

Uh oh!

seabbs commented May 29, 2026

Uh oh!

seabbs-bot commented May 29, 2026

Uh oh!

seabbs-bot commented May 29, 2026

Uh oh!

seabbs left a comment

Choose a reason for hiding this comment

Uh oh!

nikosbosse commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nikosbosse commented May 28, 2026 •

edited by seabbs-bot

Loading

codecov Bot commented May 28, 2026 •

edited

Loading