Publish kegg116 KEGG artefacts (v0.1.0)#28
Merged
Conversation
First downloadable KEGG artefact set, wired into the runtime resolvers: - All artefacts are gzip and version-prefixed (kegg116_<name>.gz) so MATLAB and Windows read them with the built-in gunzip, no external tool. organism_gene_ko moves from xz to gzip for the same reason. - HMM libraries ship as one gzip concatenated flatfile per domain; ensure_kegg_hmm_library decompresses and hmmpresses on first use, ~10x smaller than the pressed index and portable across HMMER versions. - Add a version-prefix-tolerant artefact resolver (_resolve_artefact) used by the organism/sequence entry points; parse_kegg_dump and build_kegg_artefacts.py gain an opt-in --version. - Populate data/manifest.json and _DATA_REGISTRY with the kegg116 release assets (real SHA256 + bytes); refresh the maintainer docs and manifest example. - Bump version to 0.1.0 and update CHANGELOG.
Publish kegg116_taxonomy.gz and regenerate RAVEN's keggPhylDist from it, so GECKO's organism-distance kcat selection needs no MATLAB .mat file: - reconstruction.kegg.phyl_dist + PhylDist faithfully reproduce RAVEN getPhylDist's (asymmetric, occasionally negative) distance metric; parse_taxonomy_records exposes ids/names/lineages and reads .gz transparently. - data.ensure_kegg_taxonomy fetches the artefact; build_kegg_artefacts.py emits it. - Register kegg116_taxonomy.gz in data/manifest.json and _DATA_REGISTRY (8 files). - Tests for phyl_dist (hand-checked against RAVEN) and the taxonomy fetch; update migration/IMPROVEMENTS/maintainer docs and CHANGELOG.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See CHANGELOG 0.1.0