Cargando…

OGRE: Overlap Graph-based metagenomic Read clustEring

MOTIVATION: The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering rea...

Descripción completa

Detalles Bibliográficos
Autores principales: Balvert, Marleen, Luo, Xiao, Hauptfeld, Ernestina, Schönhuth, Alexander, Dutilh, Bas E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8128468/
https://www.ncbi.nlm.nih.gov/pubmed/32871010
http://dx.doi.org/10.1093/bioinformatics/btaa760
_version_ 1783694117693292544
author Balvert, Marleen
Luo, Xiao
Hauptfeld, Ernestina
Schönhuth, Alexander
Dutilh, Bas E
author_facet Balvert, Marleen
Luo, Xiao
Hauptfeld, Ernestina
Schönhuth, Alexander
Dutilh, Bas E
author_sort Balvert, Marleen
collection PubMed
description MOTIVATION: The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. RESULTS: We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. CONCLUSION: OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. AVAILABILITYAND IMPLEMENTATION: Code is made available on Github (https://github.com/Marleen1/OGRE). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8128468
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-81284682021-05-21 OGRE: Overlap Graph-based metagenomic Read clustEring Balvert, Marleen Luo, Xiao Hauptfeld, Ernestina Schönhuth, Alexander Dutilh, Bas E Bioinformatics Original Papers MOTIVATION: The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. RESULTS: We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. CONCLUSION: OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. AVAILABILITYAND IMPLEMENTATION: Code is made available on Github (https://github.com/Marleen1/OGRE). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-09-01 /pmc/articles/PMC8128468/ /pubmed/32871010 http://dx.doi.org/10.1093/bioinformatics/btaa760 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestrictedreuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Balvert, Marleen
Luo, Xiao
Hauptfeld, Ernestina
Schönhuth, Alexander
Dutilh, Bas E
OGRE: Overlap Graph-based metagenomic Read clustEring
title OGRE: Overlap Graph-based metagenomic Read clustEring
title_full OGRE: Overlap Graph-based metagenomic Read clustEring
title_fullStr OGRE: Overlap Graph-based metagenomic Read clustEring
title_full_unstemmed OGRE: Overlap Graph-based metagenomic Read clustEring
title_short OGRE: Overlap Graph-based metagenomic Read clustEring
title_sort ogre: overlap graph-based metagenomic read clustering
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8128468/
https://www.ncbi.nlm.nih.gov/pubmed/32871010
http://dx.doi.org/10.1093/bioinformatics/btaa760
work_keys_str_mv AT balvertmarleen ogreoverlapgraphbasedmetagenomicreadclustering
AT luoxiao ogreoverlapgraphbasedmetagenomicreadclustering
AT hauptfeldernestina ogreoverlapgraphbasedmetagenomicreadclustering
AT schonhuthalexander ogreoverlapgraphbasedmetagenomicreadclustering
AT dutilhbase ogreoverlapgraphbasedmetagenomicreadclustering