Skip to content

Fix all-NaN extra obs after PseudobulkSpace with groups_col#1006

Open
Zethson wants to merge 1 commit into
mainfrom
fix/pseudobulk-extra-obs-1003
Open

Fix all-NaN extra obs after PseudobulkSpace with groups_col#1006
Zethson wants to merge 1 commit into
mainfrom
fix/pseudobulk-extra-obs-1003

Conversation

@Zethson
Copy link
Copy Markdown
Member

@Zethson Zethson commented Jun 2, 2026

Summary

  • Closes Empty index when using pt.tl.PseudobulkSpace() #1003.
  • When PseudobulkSpace.compute was called with groups_col, every obs column not produced by sc.get.aggregate (e.g. Efficacy, Treatment) came out all-NaN. The per-group lookup is a MultiIndex(target_col, groups_col), but it was being reindexed against ps_adata.obs.index, which is the joined "target_groups" string index — so nothing matched.
  • The downstream symptom is the empty-index design matrix that DeseqDataSet rejects: with all-NaN factors, formulaic drops every row.
  • Fix: reindex the per-group lookup using the grouping columns themselves (single Index when there is one grouping col, MultiIndex.from_frame when there are two), then re-attach ps_adata.obs.index.
  • Added a regression test that asserts the extra obs column is preserved when groups_col is set.

When `groups_col` is provided, `ps_adata.obs.index` is a joined string
like "patient_cluster", so reindexing the per-group lookup against that
index returned NaN for every extra column. The all-NaN `obs` then made
formulaic drop every row, producing an empty-index design matrix that
`DeseqDataSet` rejected (#1003). Reindex by the grouping
columns themselves instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the bug Something isn't working label Jun 2, 2026
@Zethson Zethson requested a review from LuisHeinzlmeier June 2, 2026 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Empty index when using pt.tl.PseudobulkSpace()

1 participant