Cargando…
MetaBCC-LR: metagenomics binning by coverage and composition for long reads
MOTIVATION: Metagenomics studies have provided key insights into the composition and structure of microbial communities found in different environments. Among the techniques used to analyse metagenomic data, binning is considered a crucial step to characterize the different species of micro-organism...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355282/ https://www.ncbi.nlm.nih.gov/pubmed/32657364 http://dx.doi.org/10.1093/bioinformatics/btaa441 |
_version_ | 1783558244368646144 |
---|---|
author | Wickramarachchi, Anuradha Mallawaarachchi, Vijini Rajan, Vaibhav Lin, Yu |
author_facet | Wickramarachchi, Anuradha Mallawaarachchi, Vijini Rajan, Vaibhav Lin, Yu |
author_sort | Wickramarachchi, Anuradha |
collection | PubMed |
description | MOTIVATION: Metagenomics studies have provided key insights into the composition and structure of microbial communities found in different environments. Among the techniques used to analyse metagenomic data, binning is considered a crucial step to characterize the different species of micro-organisms present. The use of short-read data in most binning tools poses several limitations, such as insufficient species-specific signal, and the emergence of long-read sequencing technologies offers us opportunities to surmount them. However, most current metagenomic binning tools have been developed for short reads. The few tools that can process long reads either do not scale with increasing input size or require a database with reference genomes that are often unknown. In this article, we present MetaBCC-LR, a scalable reference-free binning method which clusters long reads directly based on their k-mer coverage histograms and oligonucleotide composition. RESULTS: We evaluate MetaBCC-LR on multiple simulated and real metagenomic long-read datasets with varying coverages and error rates. Our experiments demonstrate that MetaBCC-LR substantially outperforms state-of-the-art reference-free binning tools, achieving ∼13% improvement in F1-score and ∼30% improvement in ARI compared to the best previous tools. Moreover, we show that using MetaBCC-LR before long-read assembly helps to enhance the assembly quality while significantly reducing the assembly cost in terms of time and memory usage. The efficiency and accuracy of MetaBCC-LR pave the way for more effective long-read-based metagenomics analyses to support a wide range of applications. AVAILABILITY AND IMPLEMENTATION: The source code is freely available at: https://github.com/anuradhawick/MetaBCC-LR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7355282 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-73552822020-07-16 MetaBCC-LR: metagenomics binning by coverage and composition for long reads Wickramarachchi, Anuradha Mallawaarachchi, Vijini Rajan, Vaibhav Lin, Yu Bioinformatics Bioinformatics of Microbes and Microbiomes MOTIVATION: Metagenomics studies have provided key insights into the composition and structure of microbial communities found in different environments. Among the techniques used to analyse metagenomic data, binning is considered a crucial step to characterize the different species of micro-organisms present. The use of short-read data in most binning tools poses several limitations, such as insufficient species-specific signal, and the emergence of long-read sequencing technologies offers us opportunities to surmount them. However, most current metagenomic binning tools have been developed for short reads. The few tools that can process long reads either do not scale with increasing input size or require a database with reference genomes that are often unknown. In this article, we present MetaBCC-LR, a scalable reference-free binning method which clusters long reads directly based on their k-mer coverage histograms and oligonucleotide composition. RESULTS: We evaluate MetaBCC-LR on multiple simulated and real metagenomic long-read datasets with varying coverages and error rates. Our experiments demonstrate that MetaBCC-LR substantially outperforms state-of-the-art reference-free binning tools, achieving ∼13% improvement in F1-score and ∼30% improvement in ARI compared to the best previous tools. Moreover, we show that using MetaBCC-LR before long-read assembly helps to enhance the assembly quality while significantly reducing the assembly cost in terms of time and memory usage. The efficiency and accuracy of MetaBCC-LR pave the way for more effective long-read-based metagenomics analyses to support a wide range of applications. AVAILABILITY AND IMPLEMENTATION: The source code is freely available at: https://github.com/anuradhawick/MetaBCC-LR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355282/ /pubmed/32657364 http://dx.doi.org/10.1093/bioinformatics/btaa441 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Bioinformatics of Microbes and Microbiomes Wickramarachchi, Anuradha Mallawaarachchi, Vijini Rajan, Vaibhav Lin, Yu MetaBCC-LR: metagenomics binning by coverage and composition for long reads |
title | MetaBCC-LR: metagenomics binning by coverage and composition for long reads |
title_full | MetaBCC-LR: metagenomics binning by coverage and composition for long reads |
title_fullStr | MetaBCC-LR: metagenomics binning by coverage and composition for long reads |
title_full_unstemmed | MetaBCC-LR: metagenomics binning by coverage and composition for long reads |
title_short | MetaBCC-LR: metagenomics binning by coverage and composition for long reads |
title_sort | metabcc-lr: metagenomics binning by coverage and composition for long reads |
topic | Bioinformatics of Microbes and Microbiomes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355282/ https://www.ncbi.nlm.nih.gov/pubmed/32657364 http://dx.doi.org/10.1093/bioinformatics/btaa441 |
work_keys_str_mv | AT wickramarachchianuradha metabcclrmetagenomicsbinningbycoverageandcompositionforlongreads AT mallawaarachchivijini metabcclrmetagenomicsbinningbycoverageandcompositionforlongreads AT rajanvaibhav metabcclrmetagenomicsbinningbycoverageandcompositionforlongreads AT linyu metabcclrmetagenomicsbinningbycoverageandcompositionforlongreads |