refactor: uniformize somatic-variant definition across bin scripts#469
Open
m-huertasp wants to merge 9 commits into
Open
refactor: uniformize somatic-variant definition across bin scripts#469m-huertasp wants to merge 9 commits into
m-huertasp wants to merge 9 commits into
Conversation
- style: black-format filter cohort and check contamination
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Small PR.
What this does
Determines single definition of a somatic variant and applies it
consistently across the
bin/scripts. Previously the somatic/germline split wasre-implemented inline in several places with divergent forms (single-column
VAFin
filter_cohort.py, three-column elsewhere, plus hardcoded thresholds incheck_contamination.py), which made the somatic and germline sets non-symmetric.The definition now lives in two shared helpers and every call site uses them. Closes #418.
Changes
somatic_mask/germline_masktobin/utils_filter.py— the canonical all-3-columnrule (
VAF,vd_VAF,VAF_AMvs a caller-supplied threshold), with NumPy docstrings.bin/filter_cohort.pyto use the helpers (somatic flagging goes single-column → all-3-column).bin/check_contamination.py: germline/SNP predicates use the helpers; the hardcoded0.2threshold is replaced by a new--somatic-vaf-boundaryCLI option fed fromparams.germline_threshold(wired viamodules/local/contamination/main.nf+conf/modules.config);NumPy docstrings added to all public functions.
bin/test/test_utils_filter.py— 46 unit tests covering every public function inutils_filter.py.What to review
filter_cohortsomatic flagging is now all-3-column; thebetween-samples contamination threshold moves
0.2 → 0.3(defaultgermline_threshold).Testing
pytest bin/test/test_utils_filter.py→ 46 passed + ruff clean on all touched files.The full
bin/test/suite has 3 pre-existing failures + 1 collection error in unrelated files(
test_plot_selectionsideplots.py,test_check_samplesheet.py) that are untouched by this PR.