Cargando…

MBBC: an efficient approach for metagenomic binning based on clustering

BACKGROUND: Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Ying, Hu, Haiyan, Li, Xiaoman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339733/
https://www.ncbi.nlm.nih.gov/pubmed/25652152
http://dx.doi.org/10.1186/s12859-015-0473-8
_version_ 1782358909389897728
author Wang, Ying
Hu, Haiyan
Li, Xiaoman
author_facet Wang, Ying
Hu, Haiyan
Li, Xiaoman
author_sort Wang, Ying
collection PubMed
description BACKGROUND: Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for improvement. RESULTS: We developed a novel taxonomy-independent approach called MBBC (Metagenomic Binning Based on Clustering) to cluster environmental shotgun reads, by considering k-mer frequency in reads and Markov properties of the inferred OTUs. Tested on twelve simulated datasets, MBBC reliably estimated the species number, the genome size, and the relative abundance of each species, independent of whether there are errors in reads. Tested on multiple experimental datasets, MBBC outperformed two state-of-the-art taxonomy-independent methods, in terms of the accuracy of the estimated species number, genome sizes, and percentages of correctly assigned reads, among other metrics. CONCLUSIONS: We have developed a novel method for binning metagenomic reads based on clustering. This method is demonstrated to reliably predict species numbers, genome sizes, relative species abundances, and k-mer coverage in simple datasets. Our method also has a high accuracy in read binning. The MBBC software is freely available at http://eecs.ucf.edu/~xiaoman/MBBC/MBBC.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0473-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4339733
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43397332015-02-26 MBBC: an efficient approach for metagenomic binning based on clustering Wang, Ying Hu, Haiyan Li, Xiaoman BMC Bioinformatics Methodology Article BACKGROUND: Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for improvement. RESULTS: We developed a novel taxonomy-independent approach called MBBC (Metagenomic Binning Based on Clustering) to cluster environmental shotgun reads, by considering k-mer frequency in reads and Markov properties of the inferred OTUs. Tested on twelve simulated datasets, MBBC reliably estimated the species number, the genome size, and the relative abundance of each species, independent of whether there are errors in reads. Tested on multiple experimental datasets, MBBC outperformed two state-of-the-art taxonomy-independent methods, in terms of the accuracy of the estimated species number, genome sizes, and percentages of correctly assigned reads, among other metrics. CONCLUSIONS: We have developed a novel method for binning metagenomic reads based on clustering. This method is demonstrated to reliably predict species numbers, genome sizes, relative species abundances, and k-mer coverage in simple datasets. Our method also has a high accuracy in read binning. The MBBC software is freely available at http://eecs.ucf.edu/~xiaoman/MBBC/MBBC.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0473-8) contains supplementary material, which is available to authorized users. BioMed Central 2015-02-05 /pmc/articles/PMC4339733/ /pubmed/25652152 http://dx.doi.org/10.1186/s12859-015-0473-8 Text en © Wang et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Wang, Ying
Hu, Haiyan
Li, Xiaoman
MBBC: an efficient approach for metagenomic binning based on clustering
title MBBC: an efficient approach for metagenomic binning based on clustering
title_full MBBC: an efficient approach for metagenomic binning based on clustering
title_fullStr MBBC: an efficient approach for metagenomic binning based on clustering
title_full_unstemmed MBBC: an efficient approach for metagenomic binning based on clustering
title_short MBBC: an efficient approach for metagenomic binning based on clustering
title_sort mbbc: an efficient approach for metagenomic binning based on clustering
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339733/
https://www.ncbi.nlm.nih.gov/pubmed/25652152
http://dx.doi.org/10.1186/s12859-015-0473-8
work_keys_str_mv AT wangying mbbcanefficientapproachformetagenomicbinningbasedonclustering
AT huhaiyan mbbcanefficientapproachformetagenomicbinningbasedonclustering
AT lixiaoman mbbcanefficientapproachformetagenomicbinningbasedonclustering