Cargando…
MBBC: an efficient approach for metagenomic binning based on clustering
BACKGROUND: Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339733/ https://www.ncbi.nlm.nih.gov/pubmed/25652152 http://dx.doi.org/10.1186/s12859-015-0473-8 |
_version_ | 1782358909389897728 |
---|---|
author | Wang, Ying Hu, Haiyan Li, Xiaoman |
author_facet | Wang, Ying Hu, Haiyan Li, Xiaoman |
author_sort | Wang, Ying |
collection | PubMed |
description | BACKGROUND: Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for improvement. RESULTS: We developed a novel taxonomy-independent approach called MBBC (Metagenomic Binning Based on Clustering) to cluster environmental shotgun reads, by considering k-mer frequency in reads and Markov properties of the inferred OTUs. Tested on twelve simulated datasets, MBBC reliably estimated the species number, the genome size, and the relative abundance of each species, independent of whether there are errors in reads. Tested on multiple experimental datasets, MBBC outperformed two state-of-the-art taxonomy-independent methods, in terms of the accuracy of the estimated species number, genome sizes, and percentages of correctly assigned reads, among other metrics. CONCLUSIONS: We have developed a novel method for binning metagenomic reads based on clustering. This method is demonstrated to reliably predict species numbers, genome sizes, relative species abundances, and k-mer coverage in simple datasets. Our method also has a high accuracy in read binning. The MBBC software is freely available at http://eecs.ucf.edu/~xiaoman/MBBC/MBBC.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0473-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4339733 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43397332015-02-26 MBBC: an efficient approach for metagenomic binning based on clustering Wang, Ying Hu, Haiyan Li, Xiaoman BMC Bioinformatics Methodology Article BACKGROUND: Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for improvement. RESULTS: We developed a novel taxonomy-independent approach called MBBC (Metagenomic Binning Based on Clustering) to cluster environmental shotgun reads, by considering k-mer frequency in reads and Markov properties of the inferred OTUs. Tested on twelve simulated datasets, MBBC reliably estimated the species number, the genome size, and the relative abundance of each species, independent of whether there are errors in reads. Tested on multiple experimental datasets, MBBC outperformed two state-of-the-art taxonomy-independent methods, in terms of the accuracy of the estimated species number, genome sizes, and percentages of correctly assigned reads, among other metrics. CONCLUSIONS: We have developed a novel method for binning metagenomic reads based on clustering. This method is demonstrated to reliably predict species numbers, genome sizes, relative species abundances, and k-mer coverage in simple datasets. Our method also has a high accuracy in read binning. The MBBC software is freely available at http://eecs.ucf.edu/~xiaoman/MBBC/MBBC.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0473-8) contains supplementary material, which is available to authorized users. BioMed Central 2015-02-05 /pmc/articles/PMC4339733/ /pubmed/25652152 http://dx.doi.org/10.1186/s12859-015-0473-8 Text en © Wang et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Wang, Ying Hu, Haiyan Li, Xiaoman MBBC: an efficient approach for metagenomic binning based on clustering |
title | MBBC: an efficient approach for metagenomic binning based on clustering |
title_full | MBBC: an efficient approach for metagenomic binning based on clustering |
title_fullStr | MBBC: an efficient approach for metagenomic binning based on clustering |
title_full_unstemmed | MBBC: an efficient approach for metagenomic binning based on clustering |
title_short | MBBC: an efficient approach for metagenomic binning based on clustering |
title_sort | mbbc: an efficient approach for metagenomic binning based on clustering |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339733/ https://www.ncbi.nlm.nih.gov/pubmed/25652152 http://dx.doi.org/10.1186/s12859-015-0473-8 |
work_keys_str_mv | AT wangying mbbcanefficientapproachformetagenomicbinningbasedonclustering AT huhaiyan mbbcanefficientapproachformetagenomicbinningbasedonclustering AT lixiaoman mbbcanefficientapproachformetagenomicbinningbasedonclustering |