Cargando…

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample

Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Yi, Leung, Henry C.M., Yiu, S.M., Chin, Francis Y.L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2012
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436824/ https://www.ncbi.nlm.nih.gov/pubmed/22962452 http://dx.doi.org/10.1093/bioinformatics/bts397

_version_	1782242706187091968
author	Wang, Yi Leung, Henry C.M. Yiu, S.M. Chin, Francis Y.L.
author_facet	Wang, Yi Leung, Henry C.M. Yiu, S.M. Chin, Francis Y.L.
author_sort	Wang, Yi
collection	PubMed
description	Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable. Results: We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6× to 10×) and can achieve better sensitivity and specificity using less memory and running time. Availability: http://i.cs.hku.hk/~alse/MetaCluster/ Contact: chin@cs.hku.hk
format	Online Article Text
id	pubmed-3436824
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-34368242012-12-12 MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample Wang, Yi Leung, Henry C.M. Yiu, S.M. Chin, Francis Y.L. Bioinformatics Original Papers Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable. Results: We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6× to 10×) and can achieve better sensitivity and specificity using less memory and running time. Availability: http://i.cs.hku.hk/~alse/MetaCluster/ Contact: chin@cs.hku.hk Oxford University Press 2012-09-15 2012-09-03 /pmc/articles/PMC3436824/ /pubmed/22962452 http://dx.doi.org/10.1093/bioinformatics/bts397 Text en © The Author(s) (2012). Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Wang, Yi Leung, Henry C.M. Yiu, S.M. Chin, Francis Y.L. MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
title	MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
title_full	MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
title_fullStr	MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
title_full_unstemmed	MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
title_short	MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
title_sort	metacluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436824/ https://www.ncbi.nlm.nih.gov/pubmed/22962452 http://dx.doi.org/10.1093/bioinformatics/bts397
work_keys_str_mv	AT wangyi metacluster50atworoundbinningapproachformetagenomicdataforlowabundancespeciesinanoisysample AT leunghenrycm metacluster50atworoundbinningapproachformetagenomicdataforlowabundancespeciesinanoisysample AT yiusm metacluster50atworoundbinningapproachformetagenomicdataforlowabundancespeciesinanoisysample AT chinfrancisyl metacluster50atworoundbinningapproachformetagenomicdataforlowabundancespeciesinanoisysample

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample

Ejemplares similares