Skip to content

feature: Add Nuclei per Cell plot; fix Fraction of Transcripts in Nucleus#4

Open
nmalwinka wants to merge 2 commits into
MultiQC:mainfrom
nmalwinka:feature/nuclei_per_cell
Open

feature: Add Nuclei per Cell plot; fix Fraction of Transcripts in Nucleus#4
nmalwinka wants to merge 2 commits into
MultiQC:mainfrom
nmalwinka:feature/nuclei_per_cell

Conversation

@nmalwinka
Copy link
Copy Markdown

Summary

This PR adds a new QC plot to the Xenium report and fixes a latent bug in an
existing plot. Both changes touch nucleus-related metrics and are bundled
together because they share code paths and data sources.

1. New: "Nuclei per Cell" stacked bar

Adds a per-sample stacked bar showing the distribution of cells grouped by
number of segmented nuclei per cell:

  • 0 nuclei (red) — cells with no detected nucleus. Useful as an
    under-segmentation / boundary-stain-failure signal.
  • 1 nucleus (green) — the expected majority for most tissues.
  • 2+ nuclei (blue) — multi-nucleated cells. May reflect merge errors
    during segmentation, or genuine biology (skeletal muscle, cardiac).

Data source: nucleus_count column already present in cells.parquet.
Computed in the existing lazy polars scan in parse_cells_parquet; no
new file I/O. Schema-guarded by if "nucleus_count" in schema:, so older
Xenium outputs that lack the column skip the section gracefully.

Also adds two hidden general-stats columns: % 0-Nuclei Cells (Reds) and
% Multi-Nuclei Cells (Purples). Hidden by default so they don't crowd
the table; users can opt them in via "Configure Columns". Values stored
as fractions in [0, 1]; rendered as percentages via
modify: lambda x: x * 100.0 to match the fraction_transcripts_decoded_q20
convention.

2. Bug fix: "Fraction of Transcripts in Nucleus"

The existing plot computed its metric as:

nucleus_count / total_counts   # both from cells.parquet

This is wrong on as nucleus_count is not a transcript count. Per the Xenium schema it is the count of segmented nuclei per cell (typically 0, 1, or 2+). Dividing it by total_counts produces a meaningless tiny number.

The plot is now correctly derived from transcripts.parquet using the
per-transcript overlaps_nucleus flag, grouped by cell_id (with the
UNASSIGNED sentinel filtered):

fraction_in_nucleus = sum(overlaps_nucleus) / count(*) # per cell_id

The plot function itself (xenium_nucleus_rna_fraction_plot)
is unchanged — only its data source moves from parse_cells_parquet to
parse_transcripts_parquet (one-line edit at the call site).

Bumps version to 1.1.0.

@nmalwinka
Copy link
Copy Markdown
Author

Added Bug fix: single-sample density plots — three cell-related sections ("Cell Area Distribution", "Nucleus to Cell Area", and the transcripts-per-cell side of "Distribution of Transcripts/Genes per Cell") were silently skipping on
single-sample reports because the parser stored only summary statistics, while the density helpers require raw per-cell values. parse_cells_parquet now emits cell_area_values, nucleus_to_cell_area_ratio_values, and
total_counts_values alongside the existing box-stats summaries, so all three sections render as KDE density curves on single-sample reports (matching the existing helptext). Multi-sample reports are unaffected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant