Skip to content

refactor: uniformize somatic-variant definition across bin scripts#469

Open
m-huertasp wants to merge 9 commits into
devfrom
tests/418-uniformize-somatic-variants
Open

refactor: uniformize somatic-variant definition across bin scripts#469
m-huertasp wants to merge 9 commits into
devfrom
tests/418-uniformize-somatic-variants

Conversation

@m-huertasp
Copy link
Copy Markdown
Collaborator

Small PR.

What this does

Determines single definition of a somatic variant and applies it
consistently across the bin/ scripts. Previously the somatic/germline split was
re-implemented inline in several places with divergent forms (single-column VAF
in filter_cohort.py, three-column elsewhere, plus hardcoded thresholds in
check_contamination.py), which made the somatic and germline sets non-symmetric.
The definition now lives in two shared helpers and every call site uses them. Closes #418.

Changes

  • Add somatic_mask / germline_mask to bin/utils_filter.py — the canonical all-3-column
    rule (VAF, vd_VAF, VAF_AM vs a caller-supplied threshold), with NumPy docstrings.
  • Refactor bin/filter_cohort.py to use the helpers (somatic flagging goes single-column → all-3-column).
  • Refactor bin/check_contamination.py: germline/SNP predicates use the helpers; the hardcoded
    0.2 threshold is replaced by a new --somatic-vaf-boundary CLI option fed from
    params.germline_threshold (wired via modules/local/contamination/main.nf + conf/modules.config);
    NumPy docstrings added to all public functions.
  • Add bin/test/test_utils_filter.py — 46 unit tests covering every public function in utils_filter.py.

What to review

  • Behaviour changes: filter_cohort somatic flagging is now all-3-column; the
    between-samples contamination threshold moves 0.2 → 0.3 (default germline_threshold).

Testing

pytest bin/test/test_utils_filter.py → 46 passed + ruff clean on all touched files.
The full bin/test/ suite has 3 pre-existing failures + 1 collection error in unrelated files
(test_plot_selectionsideplots.py, test_check_samplesheet.py) that are untouched by this PR.

@m-huertasp m-huertasp changed the base branch from main to dev June 2, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

uniformize the definition of somatic variants

1 participant