Skip to content

docs(lapis): fix maintainer docs#1703

Open
chaoran-chen wants to merge 1 commit into
mainfrom
update-maintainer-docs
Open

docs(lapis): fix maintainer docs#1703
chaoran-chen wants to merge 1 commit into
mainfrom
update-maintainer-docs

Conversation

@chaoran-chen

@chaoran-chen chaoran-chen commented May 23, 2026

Copy link
Copy Markdown
Member

Pages on database configuration and preprocessing were outdated. This updates the pages, mostly relying on Claude. It would be great if someone more familiar with the the setups could review and adapt.

Relevant preview pages:

PR Checklist

  • [ ] All necessary documentation has been adapted.
  • [ ] All necessary changes are explained in the llms.txt.
  • [ ] The implemented feature is covered by an appropriate test.

@vercel

vercel Bot commented May 23, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lapis Ready Ready Preview, Comment May 31, 2026 6:54pm

Request Review

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates LAPIS maintainer reference docs to reflect current SILO/LAPIS configuration and preprocessing expectations, replacing older mixed-format and component-driven documentation with a more direct, SILO-referenced spec.

Changes:

  • Rewrite preprocessing reference to emphasize SILO-driven preprocessing, NDJSON ingestion, and updated config keys (lineage/phylo/incremental sections).
  • Restructure database_config.yaml reference into explicit top-level/schema/metadata/features sections and remove conditional/component-rendered content.
  • Remove the docs component previously used to render metadata type documentation (and inline the type list instead).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
lapis-docs/src/content/docs/maintainer-docs/references/preprocessing.mdx Rewritten preprocessing reference (NDJSON schema, config keys, lineage/phylo/incremental notes).
lapis-docs/src/content/docs/maintainer-docs/references/database-configuration.mdx Reworked database config reference into a more explicit spec and removed component-based rendering.
lapis-docs/src/components/Configuration/MetadataTypesList.astro Deleted docs component used for listing metadata types (verified no remaining references in repo).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Pages on database configuration and preprocessing were outdated.
configuration and input format that maintainers need to know in order to operate LAPIS.
For the authoritative reference, see the [SILO repository](https://github.com/GenSpectrum/LAPIS-SILO),
in particular the documents in [`documentation/`](https://github.com/GenSpectrum/LAPIS-SILO/tree/main/documentation)
(`input_format.md`, `lineage_definitions.md`, `phylogenetic_queries.md`, `incremental_preprocessing.md`).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we should mention specific filenames here. It's prone to becoming stale when someone renames files in SILO.

- a TSV file with the metadata
- FASTA files with the sequences
SILO ingests data in [NDJSON](https://ndjson.org/) format (Newline-Delimited JSON). One JSON object per line
describes a single sequence record. There is no separate TSV/FASTA input mode.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
describes a single sequence record. There is no separate TSV/FASTA input mode.
describes a single sequence record.

or do you consider this especially relevant for the target audience?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still the tutorial (/maintainer-docs/tutorials/start-lapis-and-silo) - let's flag it as outdated?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually still want to maintain this page here? It's quite unrelated to LAPIS itself. IMO we could move most of this file to the SILO docs and shorten this to briefly explaining the concept and then referring to the SILO docs.

| --------------------------- | ------ | -------- | ----------------------------------------------------------------------------------------------------- |
| `schema` | object | true | The [schema object](#the-schema-object). |
| `defaultNucleotideSequence` | string | false | Name of the default nucleotide sequence segment. Only meaningful when there is more than one segment. |
| `defaultAminoAcidSequence` | string | false | Name of the default amino acid gene |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Missing period at end of defaultAminoAcidSequence description — all other rows in this table end with a period.

Comment on lines +59 to +75
<ul>
<li>
<code>string</code>: Arbitrary text values.
</li>
<li>
<code>int</code>: Integer values.
</li>
<li>
<code>float</code>: Floating-point values.
</li>
<li>
<code>boolean</code>: <code>true</code> or <code>false</code>.
</li>
<li>
<code>date</code>: Values must be valid dates in the form <code>YYYY-MM-DD</code>.
</li>
</ul>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Metadata types list uses raw HTML <ul>/<li> while the rest of the page uses markdown tables. Minor inconsistency — could use a markdown list or table instead.

| ---------------------------- | ------- | ------------------------ | ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- |
| `inputDirectory` | path | `./` | `/preprocessing/input/` | Directory containing the input files. |
| `outputDirectory` | path | `./output/` | `/preprocessing/output/` | Directory where SILO writes the preprocessed database state. |
| `ndjsonInputFilename` | path | (none — **required**) | | NDJSON file with the input records, relative to `inputDirectory`. SILO will refuse to start preprocessing if this is unset. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: ndjsonInputFilename shown as "(none — required)" in the Default column — slightly confusing since the column header says "Default". Consider just "required" or a footnote to clarify there is no default and the field must be set.

description: Reference on the SILO preprocessing
---

import TsvExample from '../../../../components/TsvExample.astro';

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The TsvExample.astro component is now orphaned — this was its only import. Consider deleting it in this PR alongside MetadataTypesList.astro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants