Cargando…

Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction

Microbiome composition profiles generated from 16S rRNA sequencing have been extensively studied for their usefulness in phenotype trait prediction, including for complex diseases such as diabetes and obesity. These microbiome compositions have typically been quantified in the form of Operational Ta...

Descripción completa

Detalles Bibliográficos
Autores principales:	Song, Kuncheng, Wright, Fred A., Zhou, Yi-Hui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Molecular Biosciences
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7772236/ https://www.ncbi.nlm.nih.gov/pubmed/33392266 http://dx.doi.org/10.3389/fmolb.2020.610845

_version_	1783629833118416896
author	Song, Kuncheng Wright, Fred A. Zhou, Yi-Hui
author_facet	Song, Kuncheng Wright, Fred A. Zhou, Yi-Hui
author_sort	Song, Kuncheng
collection	PubMed
description	Microbiome composition profiles generated from 16S rRNA sequencing have been extensively studied for their usefulness in phenotype trait prediction, including for complex diseases such as diabetes and obesity. These microbiome compositions have typically been quantified in the form of Operational Taxonomic Unit (OTU) count matrices. However, alternate approaches such as Amplicon Sequence Variants (ASV) have been used, as well as the direct use of k-mer sequence counts. The overall effect of these different types of predictors when used in concert with various machine learning methods has been difficult to assess, due to varied combinations described in the literature. Here we provide an in-depth investigation of more than 1,000 combinations of these three clustering/counting methods, in combination with varied choices for normalization and filtering, grouping at various taxonomic levels, and the use of more than ten commonly used machine learning methods for phenotype prediction. The use of short k-mers, which have computational advantages and conceptual simplicity, is shown to be effective as a source for microbiome-based prediction. Among machine-learning approaches, tree-based methods show consistent, though modest, advantages in prediction accuracy. We describe the various advantages and disadvantages of combinations in analysis approaches, and provide general observations to serve as a useful guide for future trait-prediction explorations using microbiome data.
format	Online Article Text
id	pubmed-7772236
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-77722362020-12-31 Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction Song, Kuncheng Wright, Fred A. Zhou, Yi-Hui Front Mol Biosci Molecular Biosciences Microbiome composition profiles generated from 16S rRNA sequencing have been extensively studied for their usefulness in phenotype trait prediction, including for complex diseases such as diabetes and obesity. These microbiome compositions have typically been quantified in the form of Operational Taxonomic Unit (OTU) count matrices. However, alternate approaches such as Amplicon Sequence Variants (ASV) have been used, as well as the direct use of k-mer sequence counts. The overall effect of these different types of predictors when used in concert with various machine learning methods has been difficult to assess, due to varied combinations described in the literature. Here we provide an in-depth investigation of more than 1,000 combinations of these three clustering/counting methods, in combination with varied choices for normalization and filtering, grouping at various taxonomic levels, and the use of more than ten commonly used machine learning methods for phenotype prediction. The use of short k-mers, which have computational advantages and conceptual simplicity, is shown to be effective as a source for microbiome-based prediction. Among machine-learning approaches, tree-based methods show consistent, though modest, advantages in prediction accuracy. We describe the various advantages and disadvantages of combinations in analysis approaches, and provide general observations to serve as a useful guide for future trait-prediction explorations using microbiome data. Frontiers Media S.A. 2020-12-16 /pmc/articles/PMC7772236/ /pubmed/33392266 http://dx.doi.org/10.3389/fmolb.2020.610845 Text en Copyright © 2020 Song, Wright and Zhou. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Molecular Biosciences Song, Kuncheng Wright, Fred A. Zhou, Yi-Hui Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title	Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title_full	Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title_fullStr	Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title_full_unstemmed	Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title_short	Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
title_sort	systematic comparisons for composition profiles, taxonomic levels, and machine learning methods for microbiome-based disease prediction
topic	Molecular Biosciences
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7772236/ https://www.ncbi.nlm.nih.gov/pubmed/33392266 http://dx.doi.org/10.3389/fmolb.2020.610845
work_keys_str_mv	AT songkuncheng systematiccomparisonsforcompositionprofilestaxonomiclevelsandmachinelearningmethodsformicrobiomebaseddiseaseprediction AT wrightfreda systematiccomparisonsforcompositionprofilestaxonomiclevelsandmachinelearningmethodsformicrobiomebaseddiseaseprediction AT zhouyihui systematiccomparisonsforcompositionprofilestaxonomiclevelsandmachinelearningmethodsformicrobiomebaseddiseaseprediction

Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction

Ejemplares similares