Cargando…

Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox

The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing issues. To address these, we developed SIAMCAT, a v...

Descripción completa

Detalles Bibliográficos
Autores principales: Wirbel, Jakob, Zych, Konrad, Essex, Morgan, Karcher, Nicolai, Kartal, Ece, Salazar, Guillem, Bork, Peer, Sunagawa, Shinichi, Zeller, Georg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8008609/
https://www.ncbi.nlm.nih.gov/pubmed/33785070
http://dx.doi.org/10.1186/s13059-021-02306-1
Descripción
Sumario:The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing issues. To address these, we developed SIAMCAT, a versatile R toolbox for ML-based comparative metagenomics. We demonstrate its capabilities in a meta-analysis of fecal metagenomic studies (10,803 samples). When naively transferred across studies, ML models lost accuracy and disease specificity, which could however be resolved by a novel training set augmentation strategy. This reveals some biomarkers to be disease-specific, with others shared across multiple conditions. SIAMCAT is freely available from siamcat.embl.de. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-021-02306-1.