Cargando…

Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model

Modeling and analyzing human microbiome allows the assessment of the microbial community and its impacts on human health. Microbiome composition can be quantified using 16S rRNA technology into sequencing data, which are usually skewed and heavy-tailed with excess zeros. Clustering methods are usefu...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Dongyang, Xu, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7589204/
https://www.ncbi.nlm.nih.gov/pubmed/33092203
http://dx.doi.org/10.3390/microorganisms8101612
_version_ 1783600524836208640
author Yang, Dongyang
Xu, Wei
author_facet Yang, Dongyang
Xu, Wei
author_sort Yang, Dongyang
collection PubMed
description Modeling and analyzing human microbiome allows the assessment of the microbial community and its impacts on human health. Microbiome composition can be quantified using 16S rRNA technology into sequencing data, which are usually skewed and heavy-tailed with excess zeros. Clustering methods are useful in personalized medicine by identifying subgroups for patients stratification. However, there is currently a lack of standardized clustering method for the complex microbiome sequencing data. We propose a clustering algorithm with a specific beta diversity measure that can address the presence-absence bias encountered for sparse count data and effectively measure the sample distances for sample stratification. Our distance measure used for clustering is derived from a parametric based mixture model producing sample-specific distributions conditional on the observed operational taxonomic unit (OTU) counts and estimated mixture weights. The method can provide accurate estimates of the true zero proportions and thus construct a precise beta diversity measure. Extensive simulation studies have been conducted and suggest that the proposed method achieves substantial clustering improvement compared with some widely used distance measures when a large proportion of zeros is presented. The proposed algorithm was implemented to a human gut microbiome study on Parkinson’s diseases to identify distinct microbiome states with biological interpretations.
format Online
Article
Text
id pubmed-7589204
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75892042020-10-29 Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model Yang, Dongyang Xu, Wei Microorganisms Article Modeling and analyzing human microbiome allows the assessment of the microbial community and its impacts on human health. Microbiome composition can be quantified using 16S rRNA technology into sequencing data, which are usually skewed and heavy-tailed with excess zeros. Clustering methods are useful in personalized medicine by identifying subgroups for patients stratification. However, there is currently a lack of standardized clustering method for the complex microbiome sequencing data. We propose a clustering algorithm with a specific beta diversity measure that can address the presence-absence bias encountered for sparse count data and effectively measure the sample distances for sample stratification. Our distance measure used for clustering is derived from a parametric based mixture model producing sample-specific distributions conditional on the observed operational taxonomic unit (OTU) counts and estimated mixture weights. The method can provide accurate estimates of the true zero proportions and thus construct a precise beta diversity measure. Extensive simulation studies have been conducted and suggest that the proposed method achieves substantial clustering improvement compared with some widely used distance measures when a large proportion of zeros is presented. The proposed algorithm was implemented to a human gut microbiome study on Parkinson’s diseases to identify distinct microbiome states with biological interpretations. MDPI 2020-10-20 /pmc/articles/PMC7589204/ /pubmed/33092203 http://dx.doi.org/10.3390/microorganisms8101612 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yang, Dongyang
Xu, Wei
Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model
title Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model
title_full Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model
title_fullStr Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model
title_full_unstemmed Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model
title_short Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model
title_sort clustering on human microbiome sequencing data: a distance-based unsupervised learning model
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7589204/
https://www.ncbi.nlm.nih.gov/pubmed/33092203
http://dx.doi.org/10.3390/microorganisms8101612
work_keys_str_mv AT yangdongyang clusteringonhumanmicrobiomesequencingdataadistancebasedunsupervisedlearningmodel
AT xuwei clusteringonhumanmicrobiomesequencingdataadistancebasedunsupervisedlearningmodel