feature: Add Nuclei per Cell plot; fix Fraction of Transcripts in Nucleus#4
Open
nmalwinka wants to merge 2 commits into
Open
feature: Add Nuclei per Cell plot; fix Fraction of Transcripts in Nucleus#4nmalwinka wants to merge 2 commits into
nmalwinka wants to merge 2 commits into
Conversation
Author
|
Added Bug fix: single-sample density plots — three cell-related sections ("Cell Area Distribution", "Nucleus to Cell Area", and the transcripts-per-cell side of "Distribution of Transcripts/Genes per Cell") were silently skipping on |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a new QC plot to the Xenium report and fixes a latent bug in an
existing plot. Both changes touch nucleus-related metrics and are bundled
together because they share code paths and data sources.
1. New: "Nuclei per Cell" stacked bar
Adds a per-sample stacked bar showing the distribution of cells grouped by
number of segmented nuclei per cell:
under-segmentation / boundary-stain-failure signal.
during segmentation, or genuine biology (skeletal muscle, cardiac).
Data source:
nucleus_countcolumn already present incells.parquet.Computed in the existing lazy polars scan in
parse_cells_parquet; nonew file I/O. Schema-guarded by
if "nucleus_count" in schema:, so olderXenium outputs that lack the column skip the section gracefully.
Also adds two hidden general-stats columns: % 0-Nuclei Cells (Reds) and
% Multi-Nuclei Cells (Purples). Hidden by default so they don't crowd
the table; users can opt them in via "Configure Columns". Values stored
as fractions in [0, 1]; rendered as percentages via
modify: lambda x: x * 100.0to match thefraction_transcripts_decoded_q20convention.
2. Bug fix: "Fraction of Transcripts in Nucleus"
The existing plot computed its metric as:
This is wrong on as nucleus_count is not a transcript count. Per the Xenium schema it is the count of segmented nuclei per cell (typically 0, 1, or 2+). Dividing it by total_counts produces a meaningless tiny number.
The plot is now correctly derived from transcripts.parquet using the
per-transcript overlaps_nucleus flag, grouped by cell_id (with the
UNASSIGNED sentinel filtered):
fraction_in_nucleus = sum(overlaps_nucleus) / count(*) # per cell_idThe plot function itself (xenium_nucleus_rna_fraction_plot)
is unchanged — only its data source moves from parse_cells_parquet to
parse_transcripts_parquet (one-line edit at the call site).
Bumps version to 1.1.0.