Cargando…

Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization

BACKGROUND: Learning the structure of microbial communities is critical in understanding the different community structures and functions of microbes in distinct individuals. We view microbial communities as consisting of many subcommunities which are formed by certain groups of microbes functionall...

Descripción completa

Detalles Bibliográficos
Autores principales: Cai, Yun, Gu, Hong, Kenney, Toby
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5579944/
https://www.ncbi.nlm.nih.gov/pubmed/28859695
http://dx.doi.org/10.1186/s40168-017-0323-1
_version_ 1783260812820873216
author Cai, Yun
Gu, Hong
Kenney, Toby
author_facet Cai, Yun
Gu, Hong
Kenney, Toby
author_sort Cai, Yun
collection PubMed
description BACKGROUND: Learning the structure of microbial communities is critical in understanding the different community structures and functions of microbes in distinct individuals. We view microbial communities as consisting of many subcommunities which are formed by certain groups of microbes functionally dependent on each other. The focus of this paper is on methods for extracting the subcommunities from the data, in particular Non-Negative Matrix Factorization (NMF). Our methods can be applied to both OTU data and functional metagenomic data. We apply the existing unsupervised NMF method and also develop a new supervised NMF method for extracting interpretable information from classification problems. RESULTS: The relevance of the subcommunities identified by NMF is demonstrated by their excellent performance for classification. Through three data examples, we demonstrate how to interpret the features identified by NMF to draw meaningful biological conclusions and discover hitherto unidentified patterns in the data. Comparing whole metagenomes of various mammals, (Muegge et al., Science 332:970–974, 2011), the biosynthesis of macrolides pathway is found in hindgut-fermenting herbivores, but not carnivores. This is consistent with results in veterinary science that macrolides should not be given to non-ruminant herbivores. For time series microbiome data from various body sites (Caporaso et al., Genome Biol 12:50, 2011), a shift in the microbial communities is identified for one individual. The shift occurs at around the same time in the tongue and gut microbiomes, indicating that the shift is a genuine biological trait, rather than an artefact of the method. For whole metagenome data from IBD patients and healthy controls (Qin et al., Nature 464:59–65, 2010), we identify differences in a number of pathways (some known, others new). CONCLUSIONS: NMF is a powerful tool for identifying the key features of microbial communities. These identified features can not only be used to perform difficult classification problems with a high degree of accuracy, they are also very interpretable and can lead to important biological insights into the structure of the communities. In addition, NMF is a dimension-reduction method (similar to PCA) in that it reduces the extremely complex microbial data into a low-dimensional representation, allowing a number of analyses to be performed more easily—for example, searching for temporal patterns in the microbiome. When we are interested in the differences between the structures of two groups of communities, supervised NMF provides a better way to do this, while retaining all the advantages of NMF—e.g. interpretability and a simple biological intuition.
format Online
Article
Text
id pubmed-5579944
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55799442017-09-07 Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization Cai, Yun Gu, Hong Kenney, Toby Microbiome Research BACKGROUND: Learning the structure of microbial communities is critical in understanding the different community structures and functions of microbes in distinct individuals. We view microbial communities as consisting of many subcommunities which are formed by certain groups of microbes functionally dependent on each other. The focus of this paper is on methods for extracting the subcommunities from the data, in particular Non-Negative Matrix Factorization (NMF). Our methods can be applied to both OTU data and functional metagenomic data. We apply the existing unsupervised NMF method and also develop a new supervised NMF method for extracting interpretable information from classification problems. RESULTS: The relevance of the subcommunities identified by NMF is demonstrated by their excellent performance for classification. Through three data examples, we demonstrate how to interpret the features identified by NMF to draw meaningful biological conclusions and discover hitherto unidentified patterns in the data. Comparing whole metagenomes of various mammals, (Muegge et al., Science 332:970–974, 2011), the biosynthesis of macrolides pathway is found in hindgut-fermenting herbivores, but not carnivores. This is consistent with results in veterinary science that macrolides should not be given to non-ruminant herbivores. For time series microbiome data from various body sites (Caporaso et al., Genome Biol 12:50, 2011), a shift in the microbial communities is identified for one individual. The shift occurs at around the same time in the tongue and gut microbiomes, indicating that the shift is a genuine biological trait, rather than an artefact of the method. For whole metagenome data from IBD patients and healthy controls (Qin et al., Nature 464:59–65, 2010), we identify differences in a number of pathways (some known, others new). CONCLUSIONS: NMF is a powerful tool for identifying the key features of microbial communities. These identified features can not only be used to perform difficult classification problems with a high degree of accuracy, they are also very interpretable and can lead to important biological insights into the structure of the communities. In addition, NMF is a dimension-reduction method (similar to PCA) in that it reduces the extremely complex microbial data into a low-dimensional representation, allowing a number of analyses to be performed more easily—for example, searching for temporal patterns in the microbiome. When we are interested in the differences between the structures of two groups of communities, supervised NMF provides a better way to do this, while retaining all the advantages of NMF—e.g. interpretability and a simple biological intuition. BioMed Central 2017-08-31 /pmc/articles/PMC5579944/ /pubmed/28859695 http://dx.doi.org/10.1186/s40168-017-0323-1 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Cai, Yun
Gu, Hong
Kenney, Toby
Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization
title Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization
title_full Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization
title_fullStr Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization
title_full_unstemmed Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization
title_short Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization
title_sort learning microbial community structures with supervised and unsupervised non-negative matrix factorization
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5579944/
https://www.ncbi.nlm.nih.gov/pubmed/28859695
http://dx.doi.org/10.1186/s40168-017-0323-1
work_keys_str_mv AT caiyun learningmicrobialcommunitystructureswithsupervisedandunsupervisednonnegativematrixfactorization
AT guhong learningmicrobialcommunitystructureswithsupervisedandunsupervisednonnegativematrixfactorization
AT kenneytoby learningmicrobialcommunitystructureswithsupervisedandunsupervisednonnegativematrixfactorization