Skip to content

feat(collection-seeder): add RSV-A/B Nextclade lineages from nextclade_data tree.json#1268

Open
fhennig wants to merge 5 commits into
mainfrom
feat/rsv-nextclade-lineages-seeder
Open

feat(collection-seeder): add RSV-A/B Nextclade lineages from nextclade_data tree.json#1268
fhennig wants to merge 5 commits into
mainfrom
feat/rsv-nextclade-lineages-seeder

Conversation

@fhennig

@fhennig fhennig commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Closes #1256

Summary

  • Adds RsvANextcladeLineagesSource and RsvBNextcladeLineagesSource that walk the Nextclade reference tree JSON to produce one collection per RSV clade
  • Shared base class _RsvNextcladeLineagesBase handles the HTTP fetch; subclasses point at the hardcoded April 2026 snapshots from nextstrain/nextclade_data
  • Each collection gets four variants, mirroring the COVID Pango lineage shape:
    • Nucleotide substitutions — full set accumulated from the reference root (1-based genomic coordinates), expressed as REF_BASE + POSITION + CURRENT_BASE (e.g. A982T)
    • Amino acid substitutions — full AA set from reference root, expressed as GENE:REF_AA + POSITION + CURRENT_AA (e.g. F:T8A)
    • New nucleotide substitutions — branch-only delta, what this clade introduces relative to its parent
    • New amino acid substitutions — branch-only AA delta
  • Accumulation handles multi-hop mutations (A→C→G collapses to A→G) and reversions (A→C→A is dropped from the accumulated set)
  • Both sources registered in registry.py and included in the default run
  • 45 new tests covering accumulation logic, tree extraction, collection shape, source metadata, and HTTP fetch behaviour

Test plan

  • pixi run -e test test — all 95 tests pass
  • Seeded against local backend — 45 RSV-A and 30 RSV-B collections created/updated successfully

🤖 Generated with Claude Code

…e_data tree.json

Closes #1256.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dashboards Ready Ready Preview, Comment Jun 11, 2026 1:07pm

Request Review

…t for RSV Nextclade clades

Each clade collection now has four variants (matching COVID Pango shape):
- Nucleotide/AA substitutions: full set from reference root (all mutations
  accumulated along the path root → this clade)
- New nucleotide/AA substitutions: branch-only delta (what this clade step
  introduces relative to its parent)

The accumulation handles multi-hop mutations (A→C→G becomes A→G) and
reversions (A→C→A is removed from the accumulated set).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…wn by call depth

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eference base

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread collection-seeding/sources/rsv_nextclade_lineages.py
…ades

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(collection-seeder): add RSV-A/B Nextclade lineages from nextclade_data tree.json

2 participants