fix: handle infinite values in anomaly score calculation#100
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR prevents a crash in the high-precision (decimal) anomaly score recomputation path when the computed log survival function underflows and dcm_binom_logsf returns np.inf (a float), which previously led to a float / Decimal TypeError. It aligns with issue #87 by ensuring extreme clustering signals can be represented as an infinite clustering score rather than terminating execution.
Changes:
- Add a guard in
get_dcm_anomaly_scoreto detect an infinite (float) numerator and returnnp.infearly. - Preserve existing p-value computation behavior; only the clustering score recomputation path is made resilient to precision-limit overflow.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When an extreme density of mutations is observed on a residue, the probability of seeing it by chance is so small that it gets approximated to zero, and the clustering score becomes infinite. When this happens, Oncodrive3D tries to recompute the score using the
decimalpackage (get_dcm_anomaly_score), which supports 600-digit precision.In the reported case, the signal was so high that even 600-digit precision wasn't enough and it hit the precision limit again. This case wasn't expected:
dcm_binom_logsfreturnednp.inf(a float) instead of aDecimal, and the next line triedfloat / Decimalwhich crashed with aTypeError.closes #87
Fix
Check if the high-precision result is again
inf. In that case, just returninfas the clustering score. Nothing changes at the level of p-value calculation.Tests
TypeErrorStatus = Processedand a significant p-value