Skip to content

Fix silent NaN results and KeyError in test of no effect#5249

Open
ricardoofnl wants to merge 1 commit into
facebook:mainfrom
ricardoofnl:fix-no-effects-edge-cases
Open

Fix silent NaN results and KeyError in test of no effect#5249
ricardoofnl wants to merge 1 commit into
facebook:mainfrom
ricardoofnl:fix-no-effects-edge-cases

Conversation

@ricardoofnl

Copy link
Copy Markdown

Summary

Fixes #5248.

TestOfNoEffectAnalysis and the underlying functions in ax/utils/stats/no_effects.py mishandled three degenerate inputs:

  1. Data without the optional n column (as produced by standard trial evaluation, e.g. Client.complete_trial) crashed with a bare KeyError: 'n'. A new _validate_sample_sizes helper now raises a UserInputError explaining that per-arm sample sizes are required, from both check_experiment_effects_per_metric and check_experiment_effects.

  2. Deterministic data (sem == 0) made Welch's test divide by zero, producing a NaN p-value that silently read as "no effect" even when the exactly-known means clearly differed — the healthcheck warned "no effects detected" in exactly the wrong situation. Zero-variance arms are now handled explicitly in no_effect_test_welch:

    • differing exactly-known means are an exact effect (p = 0),
    • identical all-deterministic means are no effect (p = 1),
    • a mixed zero/positive-sem group raises a UserInputError, since Welch's test is undefined there.
  3. Single-arm groups (K == 1) divided by K - 1 == 0 and silently produced NaN / "no effect". Such groups are now skipped in check_experiment_effects_per_metric (a single-arm group cannot show within-trial effects), no_effect_test_welch validates K >= 2 and n > 1, and TestOfNoEffectAnalysis raises a UserInputError when no trial has two or more arms.

Also corrects the check_experiment_effects docstring, which described ineffective_on_objectives with the same sentence as effective.

Test plan

  • New ax/utils/stats/tests/test_no_effects.py covering: missing/null n column, sem == 0 with differing means (exact effect detected), sem == 0 with equal means (no effect, well-defined p-value), mixed zero/positive sems, single-arm group skipping, and the K >= 2 / n > 1 validations.
  • Three new cases in ax/analysis/healthcheck/tests/test_no_effects_analysis.py: informative error without the n column, PASS on deterministic data with real effects (previously a spurious WARNING), and informative error when all trials are single-arm.
  • All 18 tests in the two touched test files pass; the end-to-end repro from [Bug]: TestOfNoEffectAnalysis crashes with KeyError 'n' on standard Client data, and silently reports "no effect" for sem=0 or single-arm trials #5248 now raises an actionable UserInputError instead of KeyError: 'n'.

TestOfNoEffectAnalysis and the underlying functions in
ax/utils/stats/no_effects.py mishandled three degenerate inputs:

1. Data without the optional `n` column (as produced by standard trial
   evaluation, e.g. `Client.complete_trial`) crashed with a bare
   KeyError: 'n'. Now raises a UserInputError explaining that per-arm
   sample sizes are required.

2. Deterministic data (sem == 0) made Welch's test divide by zero,
   producing a NaN p-value that silently read as "no effect" even when
   the exactly-known means clearly differed - the healthcheck warned
   "no effects detected" in exactly the wrong situation. Zero-variance
   arms are now handled explicitly: differing exactly-known means are
   an exact effect (p = 0), identical all-deterministic means are no
   effect (p = 1), and a mixed zero/positive-sem group raises a
   UserInputError since the test is undefined.

3. Single-arm groups (K == 1) divided by K - 1 == 0 and silently
   produced NaN / "no effect". Such groups are now skipped in
   check_experiment_effects_per_metric, no_effect_test_welch validates
   K >= 2 and n > 1, and TestOfNoEffectAnalysis raises a UserInputError
   when no trial has two or more arms.

Also corrects the check_experiment_effects docstring, which described
ineffective_on_objectives with the same sentence as effective.
@meta-cla

meta-cla Bot commented Jul 2, 2026

Copy link
Copy Markdown

Hi @ricardoofnl!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla meta-cla Bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jul 2, 2026
@meta-cla

meta-cla Bot commented Jul 2, 2026

Copy link
Copy Markdown

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@saitcakmak saitcakmak left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@meta-codesync

meta-codesync Bot commented Jul 3, 2026

Copy link
Copy Markdown

@saitcakmak has imported this pull request. If you are a Meta employee, you can view this in D110608351.

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.71795% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 96.57%. Comparing base (d8eeb97) to head (75ffc87).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
ax/utils/stats/no_effects.py 94.73% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5249   +/-   ##
=======================================
  Coverage   96.56%   96.57%           
=======================================
  Files         619      620    +1     
  Lines       70307    70385   +78     
=======================================
+ Hits        67895    67972   +77     
- Misses       2412     2413    +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: TestOfNoEffectAnalysis crashes with KeyError 'n' on standard Client data, and silently reports "no effect" for sem=0 or single-arm trials

3 participants