Cargando…
MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436824/ https://www.ncbi.nlm.nih.gov/pubmed/22962452 http://dx.doi.org/10.1093/bioinformatics/bts397 |
_version_ | 1782242706187091968 |
---|---|
author | Wang, Yi Leung, Henry C.M. Yiu, S.M. Chin, Francis Y.L. |
author_facet | Wang, Yi Leung, Henry C.M. Yiu, S.M. Chin, Francis Y.L. |
author_sort | Wang, Yi |
collection | PubMed |
description | Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable. Results: We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6× to 10×) and can achieve better sensitivity and specificity using less memory and running time. Availability: http://i.cs.hku.hk/~alse/MetaCluster/ Contact: chin@cs.hku.hk |
format | Online Article Text |
id | pubmed-3436824 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-34368242012-12-12 MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample Wang, Yi Leung, Henry C.M. Yiu, S.M. Chin, Francis Y.L. Bioinformatics Original Papers Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable. Results: We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6× to 10×) and can achieve better sensitivity and specificity using less memory and running time. Availability: http://i.cs.hku.hk/~alse/MetaCluster/ Contact: chin@cs.hku.hk Oxford University Press 2012-09-15 2012-09-03 /pmc/articles/PMC3436824/ /pubmed/22962452 http://dx.doi.org/10.1093/bioinformatics/bts397 Text en © The Author(s) (2012). Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Wang, Yi Leung, Henry C.M. Yiu, S.M. Chin, Francis Y.L. MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample |
title | MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample |
title_full | MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample |
title_fullStr | MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample |
title_full_unstemmed | MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample |
title_short | MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample |
title_sort | metacluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436824/ https://www.ncbi.nlm.nih.gov/pubmed/22962452 http://dx.doi.org/10.1093/bioinformatics/bts397 |
work_keys_str_mv | AT wangyi metacluster50atworoundbinningapproachformetagenomicdataforlowabundancespeciesinanoisysample AT leunghenrycm metacluster50atworoundbinningapproachformetagenomicdataforlowabundancespeciesinanoisysample AT yiusm metacluster50atworoundbinningapproachformetagenomicdataforlowabundancespeciesinanoisysample AT chinfrancisyl metacluster50atworoundbinningapproachformetagenomicdataforlowabundancespeciesinanoisysample |