Fix silent NaN results and KeyError in test of no effect#5249
Fix silent NaN results and KeyError in test of no effect#5249ricardoofnl wants to merge 1 commit into
Conversation
TestOfNoEffectAnalysis and the underlying functions in ax/utils/stats/no_effects.py mishandled three degenerate inputs: 1. Data without the optional `n` column (as produced by standard trial evaluation, e.g. `Client.complete_trial`) crashed with a bare KeyError: 'n'. Now raises a UserInputError explaining that per-arm sample sizes are required. 2. Deterministic data (sem == 0) made Welch's test divide by zero, producing a NaN p-value that silently read as "no effect" even when the exactly-known means clearly differed - the healthcheck warned "no effects detected" in exactly the wrong situation. Zero-variance arms are now handled explicitly: differing exactly-known means are an exact effect (p = 0), identical all-deterministic means are no effect (p = 1), and a mixed zero/positive-sem group raises a UserInputError since the test is undefined. 3. Single-arm groups (K == 1) divided by K - 1 == 0 and silently produced NaN / "no effect". Such groups are now skipped in check_experiment_effects_per_metric, no_effect_test_welch validates K >= 2 and n > 1, and TestOfNoEffectAnalysis raises a UserInputError when no trial has two or more arms. Also corrects the check_experiment_effects docstring, which described ineffective_on_objectives with the same sentence as effective.
|
Hi @ricardoofnl! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
|
@saitcakmak has imported this pull request. If you are a Meta employee, you can view this in D110608351. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5249 +/- ##
=======================================
Coverage 96.56% 96.57%
=======================================
Files 619 620 +1
Lines 70307 70385 +78
=======================================
+ Hits 67895 67972 +77
- Misses 2412 2413 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Summary
Fixes #5248.
TestOfNoEffectAnalysisand the underlying functions inax/utils/stats/no_effects.pymishandled three degenerate inputs:Data without the optional
ncolumn (as produced by standard trial evaluation, e.g.Client.complete_trial) crashed with a bareKeyError: 'n'. A new_validate_sample_sizeshelper now raises aUserInputErrorexplaining that per-arm sample sizes are required, from bothcheck_experiment_effects_per_metricandcheck_experiment_effects.Deterministic data (
sem == 0) made Welch's test divide by zero, producing a NaN p-value that silently read as "no effect" even when the exactly-known means clearly differed — the healthcheck warned "no effects detected" in exactly the wrong situation. Zero-variance arms are now handled explicitly inno_effect_test_welch:p = 0),p = 1),UserInputError, since Welch's test is undefined there.Single-arm groups (
K == 1) divided byK - 1 == 0and silently produced NaN / "no effect". Such groups are now skipped incheck_experiment_effects_per_metric(a single-arm group cannot show within-trial effects),no_effect_test_welchvalidatesK >= 2andn > 1, andTestOfNoEffectAnalysisraises aUserInputErrorwhen no trial has two or more arms.Also corrects the
check_experiment_effectsdocstring, which describedineffective_on_objectiveswith the same sentence aseffective.Test plan
ax/utils/stats/tests/test_no_effects.pycovering: missing/nullncolumn,sem == 0with differing means (exact effect detected),sem == 0with equal means (no effect, well-defined p-value), mixed zero/positive sems, single-arm group skipping, and theK >= 2/n > 1validations.ax/analysis/healthcheck/tests/test_no_effects_analysis.py: informative error without thencolumn, PASS on deterministic data with real effects (previously a spurious WARNING), and informative error when all trials are single-arm.UserInputErrorinstead ofKeyError: 'n'.