Cargando…
Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4939962/ https://www.ncbi.nlm.nih.gov/pubmed/27400279 http://dx.doi.org/10.1371/journal.pcbi.1004977 |
_version_ | 1782442079726600192 |
---|---|
author | Pasolli, Edoardo Truong, Duy Tin Malik, Faizan Waldron, Levi Segata, Nicola |
author_facet | Pasolli, Edoardo Truong, Duy Tin Malik, Faizan Waldron, Levi Segata, Nicola |
author_sort | Pasolli, Edoardo |
collection | PubMed |
description | Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the “healthy” microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml. |
format | Online Article Text |
id | pubmed-4939962 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-49399622016-07-22 Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights Pasolli, Edoardo Truong, Duy Tin Malik, Faizan Waldron, Levi Segata, Nicola PLoS Comput Biol Research Article Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the “healthy” microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml. Public Library of Science 2016-07-11 /pmc/articles/PMC4939962/ /pubmed/27400279 http://dx.doi.org/10.1371/journal.pcbi.1004977 Text en © 2016 Pasolli et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Pasolli, Edoardo Truong, Duy Tin Malik, Faizan Waldron, Levi Segata, Nicola Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights |
title | Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights |
title_full | Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights |
title_fullStr | Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights |
title_full_unstemmed | Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights |
title_short | Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights |
title_sort | machine learning meta-analysis of large metagenomic datasets: tools and biological insights |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4939962/ https://www.ncbi.nlm.nih.gov/pubmed/27400279 http://dx.doi.org/10.1371/journal.pcbi.1004977 |
work_keys_str_mv | AT pasolliedoardo machinelearningmetaanalysisoflargemetagenomicdatasetstoolsandbiologicalinsights AT truongduytin machinelearningmetaanalysisoflargemetagenomicdatasetstoolsandbiologicalinsights AT malikfaizan machinelearningmetaanalysisoflargemetagenomicdatasetstoolsandbiologicalinsights AT waldronlevi machinelearningmetaanalysisoflargemetagenomicdatasetstoolsandbiologicalinsights AT segatanicola machinelearningmetaanalysisoflargemetagenomicdatasetstoolsandbiologicalinsights |