Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions .github/instructions/config-method.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
---
description: "Use when writing, fixing, or reviewing config.vsh.yaml files in src/methods/ or src/control_methods/. Covers required metadata, info fields, docker engine setup, nextflow runner labels, and how to verify components."
applyTo: "src/methods/**/config.vsh.yaml,src/control_methods/**/config.vsh.yaml"
---
# Method & Control Method Config Guidelines

## Structure Overview

```yaml
__merge__: /src/api/comp_method.yaml # or comp_control_method.yaml
name: "my_method" # snake_case, unique
label: My Method # human-readable, used in tables
summary: "One sentence summary." # used in overview tables
description: | # multi-paragraph, used in docs
Longer description...
references: # omit for control methods
doi:
- 10.xxxx/xxxxx
links: # omit for control methods
repository: https://github.com/...
documentation: https://...
info:
variants:
my_method_default:
my_method_variant:
some_param: value
arguments: # only if method has extra params beyond --input/--output
- name: "--some_param"
type: integer
description: "..."
example: 100 # use example, NOT default
info:
test_default: 1 # override value used during viash test only
resources:
- type: r_script # or python_script
path: script.R # or script.py
engines:
- type: docker
image: openproblems/base_r:1 # see base images below
setup:
- type: r
packages: [package1, package2]
runners:
- type: executable
- type: nextflow
directives:
label: [midtime, highmem, midcpu] # adjust to actual needs
```

## Methods vs Control Methods

| Field | Method | Control Method |
|---|---|---|
| `__merge__` | `/src/api/comp_method.yaml` | `/src/api/comp_control_method.yaml` |
| `references` | required | omit |
| `links` | recommended | omit |
| inputs | `--input` (spatial dataset) | `--input` (spatial dataset) |
| extra args | `--base` (domain/tissue, optional) | none |

## Required Metadata Fields

- `name`: unique, matches `[a-z][a-z0-9_]*`
- `label`: short human-readable name
- `summary`: one sentence
- `description`: one or more paragraphs
- `references.doi` (methods only): list of DOIs

## info Section

- `variants`: each key becomes a separate benchmark entry. Override any argument value by nesting it under the variant key. Every method needs at least one variant with the same name as the method.

## Arguments

- Do **not** set `default` on any argument — defaults belong to the library, not the config. Use `example` to document a typical value.
- Use `info.test_default` to override a parameter value **only during `viash test`** (not in benchmarks). This is useful to reduce epoch counts, disable slow quality checks, etc., so tests run quickly without affecting real benchmark results.
- Argument names use `--snake_case`. Viash exposes them in the script as `par['snake_case']` (Python) or `par$snake_case` (R).
- After adding, removing, or renaming any argument, regenerate the `## VIASH START` block in the script so the `par` dict stays in sync:
```bash
viash config inject src/methods/<name>/config.vsh.yaml
```

```yaml
arguments:
- name: --n_epochs
type: integer
description: "Number of training epochs."
example: 100
info:
test_default: 1 # 1 epoch during testing for speed
- name: --flow_threshold
type: double
description: "Flow error threshold. Set to 0 to skip flow quality check."
example: 0.4
info:
test_default: 0 # skip check during testing
```

## Base Docker Images

| Image | Use for |
|---|---|
| `openproblems/base_python:1` | Python, CPU |
| `openproblems/base_r:1` | R, CPU |
| `openproblems/base_pytorch_nvidia:1` | PyTorch + NVIDIA GPU |
| `openproblems/base_tensorflow_nvidia:1` | TensorFlow + NVIDIA GPU |

## Nextflow Runner Labels

Set in `runners[type=nextflow].directives.label`. Pick one from each category:

| Category | Options |
|---|---|
| Time | `lowtime`, `midtime`, `hightime` |
| Memory | `lowmem`, `midmem`, `highmem`, `veryhighmem` |
| CPU | `lowcpu`, `midcpu`, `highcpu` |
| GPU (optional) | `gpu`, `biggpu` |

## Rebuilding the Docker Image

After changing the `setup` section:
```bash
viash run src/methods/<name>/config.vsh.yaml -- ---setup cachedbuild
```

## Verification

```bash
viash test src/methods/<name>/config.vsh.yaml
```

Both test scripts must succeed (`2 out of 2`).
93 changes: 93 additions & 0 deletions .github/instructions/config-metric.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
description: "Use when writing, fixing, or reviewing config.vsh.yaml files in src/metrics/. Covers required metadata, the info.metrics list structure, docker engine setup, nextflow runner labels, and how to verify components."
applyTo: "src/metrics/**/config.vsh.yaml"
---
# Metric Config Guidelines

## Structure Overview

Metrics differ from methods: metadata (`label`, `summary`, `description`, `references`) lives inside the `info.metrics` list, not at the top level. A single component can expose multiple metric values.

```yaml
__merge__: /src/api/comp_metric.yaml
name: "my_metric" # snake_case, unique component name
info:
metrics:
- name: my_metric_value1 # snake_case, unique metric name
label: My Metric Value 1 # human-readable, used in tables
summary: "One sentence summary."
description: "Longer description."
references:
doi: 10.xxxx/xxxxx
min: 0
max: 1
maximize: true # true if higher = better
- name: my_metric_value2
label: My Metric Value 2
summary: "..."
description: "..."
references:
doi: 10.xxxx/xxxxx
min: 0
max: 1
maximize: false
resources:
- type: python_script # or r_script
path: script.py # or script.R
engines:
- type: docker
image: openproblems/base_python:1 # see base images below
setup:
- type: python
packages: [scikit-learn]
runners:
- type: executable
- type: nextflow
directives:
label: [midtime, midmem, midcpu]
```

## Required Fields per Metric Entry

Each entry in `info.metrics` must have:
- `name`: unique metric identifier, snake_case
- `label`: short human-readable name
- `summary`: one sentence
- `description`: full description
- `references.doi`: DOI(s) for the metric
- `min` / `max`: numeric range of possible values
- `maximize`: `true` if higher score = better performance

## Base Docker Images

| Image | Use for |
|---|---|
| `openproblems/base_python:1` | Python, CPU |
| `openproblems/base_r:1` | R, CPU |

Metrics rarely need GPU images.

## Nextflow Runner Labels

Metrics are typically lightweight. Use conservative defaults:

| Category | Options |
|---|---|
| Time | `lowtime`, `midtime`, `hightime` |
| Memory | `lowmem`, `midmem`, `highmem` |
| CPU | `lowcpu`, `midcpu`, `highcpu` |

## Rebuilding the Docker Image

After changing the `setup` section:
```bash
viash run src/metrics/<name>/config.vsh.yaml -- ---setup cachedbuild
```

## Verification

```bash
viash test src/metrics/<name>/config.vsh.yaml
```

Both test scripts must succeed (`2 out of 2`).
97 changes: 97 additions & 0 deletions .github/instructions/method-scripts-python.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
description: "Use when writing, fixing, or reviewing method/metric script.py files in src/methods/, src/metrics/, or src/control_methods/. Covers script style, API compliance, and how to verify components."
applyTo: "src/methods/**/script.py,src/metrics/**/script.py,src/control_methods/**/script.py"
---
# Method & Metric Script Guidelines (Python)

## Core Principle

`script.py` should represent **typical bioinformatician usage** of the tool with minimal modifications. Only adapt what is strictly necessary to:
1. Read inputs from the paths provided by `par`
2. Pass the right data structures to the method
3. Convert the method's output back into the expected output structures
4. Write outputs to `par['output']`

Do **not** restructure the method's native API, add abstraction layers, or rewrite the algorithm logic.

## Finding API Specs

Input/output file formats are defined in `src/api/`. Key files:
- `file_dataset_sp.yaml` — spatial dataset input format (contains `layers['counts']`, `obs['row']`, `obs['col']`, `obs['spatial_cluster']`, etc.)
- `file_dataset_sc.yaml` — single-cell dataset input format (metrics only)
- `file_simulated_dataset.yaml` — expected output format for methods
- `file_score.yaml` — expected output format for metrics
- `comp_method.yaml`, `comp_control_method.yaml`, `comp_metric.yaml` — component argument specs

Always check these before deciding what fields to read or write.

## The `## VIASH START` / `## VIASH END` Block

This block is **auto-generated** by viash from the component's `config.vsh.yaml` arguments. It is replaced at build/test time with a real CLI parser. Keep it in the script only as a local development convenience.

- **Do not edit it manually** to add or remove parameters — edit `config.vsh.yaml` instead.
- After adding, removing, or renaming an argument in the config, regenerate the block:
```bash
viash config inject src/methods/<name>/config.vsh.yaml
```
- Argument names in the config (`--my_param`) map directly to `par['my_param']` keys.

## Common Patterns

**Method: reading input:**
```python
import anndata as ad
input = ad.read_h5ad(par['input'])
```

**Method: writing simulated dataset output:**
```python
output = ad.AnnData(
layers={"counts": simulated_counts}, # integer matrix, cells x genes
obs=input.obs[["row", "col"]],
var=input.var,
uns={
**input.uns,
"method_id": meta["name"],
},
)
output.write_h5ad(par['output'], compression="gzip")
```

**Metric: reading inputs:**
```python
import anndata as ad
input_spatial_dataset = ad.read_h5ad(par['input_spatial_dataset'])
input_singlecell_dataset = ad.read_h5ad(par['input_singlecell_dataset'])
input_simulated_dataset = ad.read_h5ad(par['input_simulated_dataset'])
```

**Metric: writing score output:**
```python
output = ad.AnnData(
uns={
"dataset_id": input_simulated_dataset.uns["dataset_id"],
"method_id": input_simulated_dataset.uns["method_id"],
"metric_ids": ["metric_name_1", "metric_name_2"],
"metric_values": [score1, score2],
},
)
output.write_h5ad(par['output'], compression="gzip")
```

## Dependency Fixes

If a library has a dependency conflict (e.g., incompatible with newer `anndata`, `numpy`, etc.), prefer replacing it with an alternative that provides the same model/algorithm natively rather than pinning transitive dependencies.

Update `config.vsh.yaml` to remove the broken package from the `setup` block when replacing it.

## Verification

After any change to a method script or config, verify with:
```bash
viash test src/methods/<name>/config.vsh.yaml
# or
viash test src/metrics/<name>/config.vsh.yaml
```

Both test scripts must succeed (`2 out of 2`).
Loading
Loading