Cargando…

Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction

Microbiome composition profiles generated from 16S rRNA sequencing have been extensively studied for their usefulness in phenotype trait prediction, including for complex diseases such as diabetes and obesity. These microbiome compositions have typically been quantified in the form of Operational Ta...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Kuncheng, Wright, Fred A., Zhou, Yi-Hui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7772236/
https://www.ncbi.nlm.nih.gov/pubmed/33392266
http://dx.doi.org/10.3389/fmolb.2020.610845
_version_ 1783629833118416896
author Song, Kuncheng
Wright, Fred A.
Zhou, Yi-Hui
author_facet Song, Kuncheng
Wright, Fred A.
Zhou, Yi-Hui
author_sort Song, Kuncheng
collection PubMed
description Microbiome composition profiles generated from 16S rRNA sequencing have been extensively studied for their usefulness in phenotype trait prediction, including for complex diseases such as diabetes and obesity. These microbiome compositions have typically been quantified in the form of Operational Taxonomic Unit (OTU) count matrices. However, alternate approaches such as Amplicon Sequence Variants (ASV) have been used, as well as the direct use of k-mer sequence counts. The overall effect of these different types of predictors when used in concert with various machine learning methods has been difficult to assess, due to varied combinations described in the literature. Here we provide an in-depth investigation of more than 1,000 combinations of these three clustering/counting methods, in combination with varied choices for normalization and filtering, grouping at various taxonomic levels, and the use of more than ten commonly used machine learning methods for phenotype prediction. The use of short k-mers, which have computational advantages and conceptual simplicity, is shown to be effective as a source for microbiome-based prediction. Among machine-learning approaches, tree-based methods show consistent, though modest, advantages in prediction accuracy. We describe the various advantages and disadvantages of combinations in analysis approaches, and provide general observations to serve as a useful guide for future trait-prediction explorations using microbiome data.
format Online
Article
Text
id pubmed-7772236
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-77722362020-12-31 Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction Song, Kuncheng Wright, Fred A. Zhou, Yi-Hui Front Mol Biosci Molecular Biosciences Microbiome composition profiles generated from 16S rRNA sequencing have been extensively studied for their usefulness in phenotype trait prediction, including for complex diseases such as diabetes and obesity. These microbiome compositions have typically been quantified in the form of Operational Taxonomic Unit (OTU) count matrices. However, alternate approaches such as Amplicon Sequence Variants (ASV) have been used, as well as the direct use of k-mer sequence counts. The overall effect of these different types of predictors when used in concert with various machine learning methods has been difficult to assess, due to varied combinations described in the literature. Here we provide an in-depth investigation of more than 1,000 combinations of these three clustering/counting methods, in combination with varied choices for normalization and filtering, grouping at various taxonomic levels, and the use of more than ten commonly used machine learning methods for phenotype prediction. The use of short k-mers, which have computational advantages and conceptual simplicity, is shown to be effective as a source for microbiome-based prediction. Among machine-learning approaches, tree-based methods show consistent, though modest, advantages in prediction accuracy. We describe the various advantages and disadvantages of combinations in analysis approaches, and provide general observations to serve as a useful guide for future trait-prediction explorations using microbiome data. Frontiers Media S.A. 2020-12-16 /pmc/articles/PMC7772236/ /pubmed/33392266 http://dx.doi.org/10.3389/fmolb.2020.610845 Text en Copyright © 2020 Song, Wright and Zhou. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Molecular Biosciences
Song, Kuncheng
Wright, Fred A.
Zhou, Yi-Hui
Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title_full Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title_fullStr Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title_full_unstemmed Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title_short Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title_sort systematic comparisons for composition profiles, taxonomic levels, and machine learning methods for microbiome-based disease prediction
topic Molecular Biosciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7772236/
https://www.ncbi.nlm.nih.gov/pubmed/33392266
http://dx.doi.org/10.3389/fmolb.2020.610845
work_keys_str_mv AT songkuncheng systematiccomparisonsforcompositionprofilestaxonomiclevelsandmachinelearningmethodsformicrobiomebaseddiseaseprediction
AT wrightfreda systematiccomparisonsforcompositionprofilestaxonomiclevelsandmachinelearningmethodsformicrobiomebaseddiseaseprediction
AT zhouyihui systematiccomparisonsforcompositionprofilestaxonomiclevelsandmachinelearningmethodsformicrobiomebaseddiseaseprediction