Fix duplicate column names from summarise_scores() with empty metrics (#1179)#1180
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1180 +/- ##
=======================================
Coverage 97.98% 97.99%
=======================================
Files 38 38
Lines 2036 2045 +9
=======================================
+ Hits 1995 2004 +9
Misses 41 41 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
CLAUDE: All checks pass except |
|
#1179 contains a nice reprex test case can we test against it. I wasn't clear if this was from you @nikosbosse or claude. I guess our practice for review is to wait to be tagged but maybe making it clearer in the PR description would be good:? |
|
This looks good to go otherwise though it would be nice to explicitly state that both the tightenings suggested in #1179 have been implemented (it looks to me that they have) |
466b46e to
487fcc2
Compare
|
Addressed the review feedback:
This was opened by a bot. Please ping @seabbs for any questions. |
|
Automated review pass (agent quality gate) No Critical or Important findings. The change is correct and well-scoped. Observations:
CI is pending; will continue to monitor checks and mergeability. This was opened by a bot. Please ping @seabbs for any questions. |
487fcc2 to
409145b
Compare
seabbs
left a comment
There was a problem hiding this comment.
@seabbs-bot added the explicit reprex so this looks good to me now.
`summarise_scores()` selected the columns to summarise via `colnames(scores) %like% paste(metrics, collapse = "|")`. When the `metrics` attribute is empty (which happens when every metric passed to `score()` warned and returned nothing), the pattern becomes the empty string, which `%like%` matches against every column. The `by` column was then passed to the summary function, producing a duplicate `by` column in the output and the spurious "argument is not numeric or logical" warning. Switch to exact column-name matching via `intersect()` and error early when there is nothing to summarise. This also incidentally fixes a latent issue where a metric named e.g. "wis" would have matched any column whose name contained "wis" (such as "wis_relative_skill"). Closes #1179. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a regression test exercising the exact reprex from #1179: scoring example_quantile with only `interval_coverage_55` warns and produces no score columns, after which `summarise_scores()` must error rather than return a data.table with a duplicate `by` column.
409145b to
d124e69
Compare
|
Sweet, thank you! Yes haven't tagged you yet because I hadn't thought about it deeply enough myself. Looks good to me |
Replaces the local dedup-before-as_tibble workaround for the scoringutils duplicate-column bug with an eager abort that mirrors the upstream fix (epiforecasts/scoringutils#1180). The dedup is no longer needed: the bug only manifested when `summarise_scores()` was called with an empty metrics attribute, and the new abort fires before that call. Verified against CRAN scoringutils 2.2.0 that the buggy code path is never reached. Wording and condition match the upstream implementation exactly so user-facing behaviour will not shift once the scoringutils minimum version is bumped and the mirrored block is deleted. Two tests that previously expected only the upstream scoringutils warning now expect the new abort message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLAUDE: Closes #1179.
Summary
summarise_scores()previously selected which columns to summarise viacolnames(scores) %like% paste(metrics, collapse = "|"). When themetricsattribute ischaracter(0)(i.e. every metric inscore()warned and returned nothing), this pattern becomes the empty string, which%like%matches against every column — including thebycolumn. Thebycolumn was then passed to the summary function, producing the spurious "argument is not numeric or logical" warning and a data.table with a duplicatebycolumn. The duplicate is invisible inside data.table but breaks downstream conversion to tibble.intersect(colnames(scores), metrics). This also incidentally fixes a latent issue where a metric named e.g. "wis" would have matched any column whose name contained "wis" (such as "wis_relative_skill").Both tightenings from #1179 are implemented
The issue suggested two fixes (either or both); this PR does both:
summarise_scores()no longer summarises itsbycolumns. With exact-name matching,.SDcolsis nowintersect(colnames(scores), metrics), so task-ID columns (including thebycolumn) are never passed to the summary function. This removes the duplicate-column root cause.summarise_scores()errors early when there are no score columns to summarise. Whenintersect(colnames(scores), metrics)is empty there is nothing meaningful to return, so it now aborts with a clear message rather than producing a malformed object.Tests
example_quantilewith onlyinterval_coverage_55warns and produces no score columns, after whichsummarise_scores()errors.Out of scope
The issue also raises whether
score()should itself fail (rather than return an empty scores object) when every metric fails. That's a real question but a bigger design call; leaving it for a separate discussion.Test plan
testthat::test_file("tests/testthat/test-summarise_scores.R")— 16 pass)lintr::lint()clean on changed filesR CMD check/covrrelied on CI)main, which also picks up the multivariate-sample snapshot fix (Fix macOS CI snapshot precision + lint-changed-files workflow #1182) so macOS CI should be green.This was opened by a bot. Please ping @seabbs for any questions.