Cargando…
Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model
Modeling and analyzing human microbiome allows the assessment of the microbial community and its impacts on human health. Microbiome composition can be quantified using 16S rRNA technology into sequencing data, which are usually skewed and heavy-tailed with excess zeros. Clustering methods are usefu...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7589204/ https://www.ncbi.nlm.nih.gov/pubmed/33092203 http://dx.doi.org/10.3390/microorganisms8101612 |
_version_ | 1783600524836208640 |
---|---|
author | Yang, Dongyang Xu, Wei |
author_facet | Yang, Dongyang Xu, Wei |
author_sort | Yang, Dongyang |
collection | PubMed |
description | Modeling and analyzing human microbiome allows the assessment of the microbial community and its impacts on human health. Microbiome composition can be quantified using 16S rRNA technology into sequencing data, which are usually skewed and heavy-tailed with excess zeros. Clustering methods are useful in personalized medicine by identifying subgroups for patients stratification. However, there is currently a lack of standardized clustering method for the complex microbiome sequencing data. We propose a clustering algorithm with a specific beta diversity measure that can address the presence-absence bias encountered for sparse count data and effectively measure the sample distances for sample stratification. Our distance measure used for clustering is derived from a parametric based mixture model producing sample-specific distributions conditional on the observed operational taxonomic unit (OTU) counts and estimated mixture weights. The method can provide accurate estimates of the true zero proportions and thus construct a precise beta diversity measure. Extensive simulation studies have been conducted and suggest that the proposed method achieves substantial clustering improvement compared with some widely used distance measures when a large proportion of zeros is presented. The proposed algorithm was implemented to a human gut microbiome study on Parkinson’s diseases to identify distinct microbiome states with biological interpretations. |
format | Online Article Text |
id | pubmed-7589204 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75892042020-10-29 Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model Yang, Dongyang Xu, Wei Microorganisms Article Modeling and analyzing human microbiome allows the assessment of the microbial community and its impacts on human health. Microbiome composition can be quantified using 16S rRNA technology into sequencing data, which are usually skewed and heavy-tailed with excess zeros. Clustering methods are useful in personalized medicine by identifying subgroups for patients stratification. However, there is currently a lack of standardized clustering method for the complex microbiome sequencing data. We propose a clustering algorithm with a specific beta diversity measure that can address the presence-absence bias encountered for sparse count data and effectively measure the sample distances for sample stratification. Our distance measure used for clustering is derived from a parametric based mixture model producing sample-specific distributions conditional on the observed operational taxonomic unit (OTU) counts and estimated mixture weights. The method can provide accurate estimates of the true zero proportions and thus construct a precise beta diversity measure. Extensive simulation studies have been conducted and suggest that the proposed method achieves substantial clustering improvement compared with some widely used distance measures when a large proportion of zeros is presented. The proposed algorithm was implemented to a human gut microbiome study on Parkinson’s diseases to identify distinct microbiome states with biological interpretations. MDPI 2020-10-20 /pmc/articles/PMC7589204/ /pubmed/33092203 http://dx.doi.org/10.3390/microorganisms8101612 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Yang, Dongyang Xu, Wei Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model |
title | Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model |
title_full | Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model |
title_fullStr | Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model |
title_full_unstemmed | Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model |
title_short | Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model |
title_sort | clustering on human microbiome sequencing data: a distance-based unsupervised learning model |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7589204/ https://www.ncbi.nlm.nih.gov/pubmed/33092203 http://dx.doi.org/10.3390/microorganisms8101612 |
work_keys_str_mv | AT yangdongyang clusteringonhumanmicrobiomesequencingdataadistancebasedunsupervisedlearningmodel AT xuwei clusteringonhumanmicrobiomesequencingdataadistancebasedunsupervisedlearningmodel |