Skip to content

Publish kegg116 KEGG artefacts (v0.1.0)#28

Merged
edkerk merged 2 commits into
developfrom
release/0.1.0-kegg116
Jun 10, 2026
Merged

Publish kegg116 KEGG artefacts (v0.1.0)#28
edkerk merged 2 commits into
developfrom
release/0.1.0-kegg116

Conversation

@edkerk

@edkerk edkerk commented Jun 10, 2026

Copy link
Copy Markdown
Member

See CHANGELOG 0.1.0

edkerk added 2 commits June 10, 2026 22:27
First downloadable KEGG artefact set, wired into the runtime resolvers:

- All artefacts are gzip and version-prefixed (kegg116_<name>.gz) so MATLAB and
  Windows read them with the built-in gunzip, no external tool. organism_gene_ko
  moves from xz to gzip for the same reason.
- HMM libraries ship as one gzip concatenated flatfile per domain;
  ensure_kegg_hmm_library decompresses and hmmpresses on first use, ~10x smaller
  than the pressed index and portable across HMMER versions.
- Add a version-prefix-tolerant artefact resolver (_resolve_artefact) used by the
  organism/sequence entry points; parse_kegg_dump and build_kegg_artefacts.py gain
  an opt-in --version.
- Populate data/manifest.json and _DATA_REGISTRY with the kegg116 release assets
  (real SHA256 + bytes); refresh the maintainer docs and manifest example.
- Bump version to 0.1.0 and update CHANGELOG.
Publish kegg116_taxonomy.gz and regenerate RAVEN's keggPhylDist from it, so GECKO's
organism-distance kcat selection needs no MATLAB .mat file:

- reconstruction.kegg.phyl_dist + PhylDist faithfully reproduce RAVEN getPhylDist's
  (asymmetric, occasionally negative) distance metric; parse_taxonomy_records exposes
  ids/names/lineages and reads .gz transparently.
- data.ensure_kegg_taxonomy fetches the artefact; build_kegg_artefacts.py emits it.
- Register kegg116_taxonomy.gz in data/manifest.json and _DATA_REGISTRY (8 files).
- Tests for phyl_dist (hand-checked against RAVEN) and the taxonomy fetch; update
  migration/IMPROVEMENTS/maintainer docs and CHANGELOG.
@edkerk edkerk merged commit c5a8e67 into develop Jun 10, 2026
6 checks passed
@edkerk edkerk deleted the release/0.1.0-kegg116 branch June 10, 2026 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant