🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)
-
Updated
May 26, 2026
🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)
An R package with over 50 highly cited, read-to-use, up-to-date COVID-19 pandemic data resources
MCP server for Open Targets data
BERT finetuned on NER downstream tasks
Measuring and visualizing biomedical data variability/heterogeneity across data sources
Synthetic biomedical data generator for reproducible benchmarking of feature selection methods in high-dimensional machine learning.
Набор инструментов для обработки радиобиологических Excel‑данных: визуализация опухолевого роста и кожных реакций, статистика, интерактивный GUI (PyQt6). Поддерживается оценка параметров LQ‑модели (α/β) и сравнение экспериментов.
Bioinformatics Classifier Project — TCGA BRCA Dataset. Exploratory analysis and machine learning classification on TCGA BRCA gene expression data, focusing on PAM50 breast cancer subtypes.
Three different basic data analysis processes of biomedical data for Python. Level: beginner (~200 lines of pure code).
Machine learning system for early Parkinson’s disease prediction using multimodal biomedical data (voice and handwriting) with Random Forest and EfficientNet models.
Healthcare AI project analyzing migraine treatment outcomes using longitudinal statistical models in R.
Step1-Step6 preprocessing workflow and final FAERS compound-PT-SOC core graph releases with standardized compounds, MedDRA PT/SOC mapping, and three pruned graph versions.
Multiclass classification of breast cancer subtypes using gene expression profiles. Evaluated and compared multiple models (Logistic Regression, Random Forest, HistGradientBoosting) using classification metrics, confusion matrices, and ROC-AUC analysis with Youden’s J statistic on synthetically generated data
Project focused on exploring and modeling the T1DiabetesGranada dataset, which contains clinical, biochemical, and continuous glucose monitoring (CGM) data from patients with Type 1 Diabetes.
This repository contains the data and analysis code for the study "Machine Learning-driven biomarker discovery for stratifying treatment response in tick-borne illness". It investigates the identification of robust and reproducible baseline predictors of treatment response using a stability-aware, multi-method machine learning framework.
Multiclass classification of breast cancer subtypes using synthetic gene expression data. Refactored code to use a single function for model evaluation across Logistic Regression, Random Forest, and HistGradientBoosting, including metrics and ROC-AUC with Youden’s J statistic.
Machine learning for Raman spectra analysis of brain tissue with robust preprocessing, classification, and interpretable biomarker discovery
A lightweight R script for text mining and harmonizing medical phenotype data. Cleans, standardizes, and maps diagnoses to ICD-10 codes, with clinical annotations for enhanced data usability.
Biomedical data science project focused on glioma mutation analysis and tumor grade prediction.
Big Data + ML clustering of 1.6M weather records with Apache Spark and Databricks
Add a description, image, and links to the biomedical-data topic page so that developers can more easily learn about it.
To associate your repository with the biomedical-data topic, visit your repo's landing page and select "manage topics."