Cargando…
BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial c...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5345454/ https://www.ncbi.nlm.nih.gov/pubmed/28289564 http://dx.doi.org/10.7717/peerj.3035 |
_version_ | 1782513723539193856 |
---|---|
author | Graham, Elaina D. Heidelberg, John F. Tully, Benjamin J. |
author_facet | Graham, Elaina D. Heidelberg, John F. Tully, Benjamin J. |
author_sort | Graham, Elaina D. |
collection | PubMed |
description | Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes. |
format | Online Article Text |
id | pubmed-5345454 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-53454542017-03-13 BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation Graham, Elaina D. Heidelberg, John F. Tully, Benjamin J. PeerJ Computational Biology Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes. PeerJ Inc. 2017-03-08 /pmc/articles/PMC5345454/ /pubmed/28289564 http://dx.doi.org/10.7717/peerj.3035 Text en ©2017 Graham et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Computational Biology Graham, Elaina D. Heidelberg, John F. Tully, Benjamin J. BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation |
title | BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation |
title_full | BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation |
title_fullStr | BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation |
title_full_unstemmed | BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation |
title_short | BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation |
title_sort | binsanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5345454/ https://www.ncbi.nlm.nih.gov/pubmed/28289564 http://dx.doi.org/10.7717/peerj.3035 |
work_keys_str_mv | AT grahamelainad binsanityunsupervisedclusteringofenvironmentalmicrobialassembliesusingcoverageandaffinitypropagation AT heidelbergjohnf binsanityunsupervisedclusteringofenvironmentalmicrobialassembliesusingcoverageandaffinitypropagation AT tullybenjaminj binsanityunsupervisedclusteringofenvironmentalmicrobialassembliesusingcoverageandaffinitypropagation |