Cargando…

BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation

Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial c...

Descripción completa

Detalles Bibliográficos
Autores principales: Graham, Elaina D., Heidelberg, John F., Tully, Benjamin J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5345454/
https://www.ncbi.nlm.nih.gov/pubmed/28289564
http://dx.doi.org/10.7717/peerj.3035
_version_ 1782513723539193856
author Graham, Elaina D.
Heidelberg, John F.
Tully, Benjamin J.
author_facet Graham, Elaina D.
Heidelberg, John F.
Tully, Benjamin J.
author_sort Graham, Elaina D.
collection PubMed
description Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes.
format Online
Article
Text
id pubmed-5345454
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-53454542017-03-13 BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation Graham, Elaina D. Heidelberg, John F. Tully, Benjamin J. PeerJ Computational Biology Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes. PeerJ Inc. 2017-03-08 /pmc/articles/PMC5345454/ /pubmed/28289564 http://dx.doi.org/10.7717/peerj.3035 Text en ©2017 Graham et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Computational Biology
Graham, Elaina D.
Heidelberg, John F.
Tully, Benjamin J.
BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
title BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
title_full BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
title_fullStr BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
title_full_unstemmed BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
title_short BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
title_sort binsanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5345454/
https://www.ncbi.nlm.nih.gov/pubmed/28289564
http://dx.doi.org/10.7717/peerj.3035
work_keys_str_mv AT grahamelainad binsanityunsupervisedclusteringofenvironmentalmicrobialassembliesusingcoverageandaffinitypropagation
AT heidelbergjohnf binsanityunsupervisedclusteringofenvironmentalmicrobialassembliesusingcoverageandaffinitypropagation
AT tullybenjaminj binsanityunsupervisedclusteringofenvironmentalmicrobialassembliesusingcoverageandaffinitypropagation