Cargando…

UMGAP: the Unipept MetaGenomics Analysis Pipeline

BACKGROUND: Shotgun metagenomics yields ever richer and larger data volumes on the complex communities living in diverse environments. Extracting deep insights from the raw reads heavily depends on the availability of fast, accurate and user-friendly biodiversity analysis tools. RESULTS: Because env...

Descripción completa

Detalles Bibliográficos
Autores principales: Van der Jeugt, Felix, Maertens, Rien, Steyaert, Aranka, Verschaffelt, Pieter, De Tender, Caroline, Dawyndt, Peter, Mesuere, Bart
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9188040/
https://www.ncbi.nlm.nih.gov/pubmed/35689184
http://dx.doi.org/10.1186/s12864-022-08542-4
_version_ 1784725289141534720
author Van der Jeugt, Felix
Maertens, Rien
Steyaert, Aranka
Verschaffelt, Pieter
De Tender, Caroline
Dawyndt, Peter
Mesuere, Bart
author_facet Van der Jeugt, Felix
Maertens, Rien
Steyaert, Aranka
Verschaffelt, Pieter
De Tender, Caroline
Dawyndt, Peter
Mesuere, Bart
author_sort Van der Jeugt, Felix
collection PubMed
description BACKGROUND: Shotgun metagenomics yields ever richer and larger data volumes on the complex communities living in diverse environments. Extracting deep insights from the raw reads heavily depends on the availability of fast, accurate and user-friendly biodiversity analysis tools. RESULTS: Because environmental samples may contain strains and species that are not covered in reference databases and because protein sequences are more conserved than the genes encoding them, we explore the alternative route of taxonomic profiling based on protein coding regions translated from the shotgun metagenomics reads, instead of directly processing the DNA reads. We therefore developed the Unipept MetaGenomics Analysis Pipeline (UMGAP), a highly versatile suite of open source tools that are implemented in Rust and support parallelization to achieve optimal performance. Six preconfigured pipelines with different performance trade-offs were carefully selected, and benchmarked against a selection of state-of-the-art shotgun metagenomics taxonomic profiling tools. CONCLUSIONS: UMGAP’s protein space detour for taxonomic profiling makes it competitive with state-of-the-art shotgun metagenomics tools. Despite our design choices of an extra protein translation step, a broad spectrum index that can identify both archaea, bacteria, eukaryotes and viruses, and a highly configurable non-monolithic design, UMGAP achieves low runtime, manageable memory footprint and high accuracy. Its interactive visualizations allow for easy exploration and comparison of complex communities. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-022-08542-4).
format Online
Article
Text
id pubmed-9188040
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-91880402022-06-12 UMGAP: the Unipept MetaGenomics Analysis Pipeline Van der Jeugt, Felix Maertens, Rien Steyaert, Aranka Verschaffelt, Pieter De Tender, Caroline Dawyndt, Peter Mesuere, Bart BMC Genomics Software BACKGROUND: Shotgun metagenomics yields ever richer and larger data volumes on the complex communities living in diverse environments. Extracting deep insights from the raw reads heavily depends on the availability of fast, accurate and user-friendly biodiversity analysis tools. RESULTS: Because environmental samples may contain strains and species that are not covered in reference databases and because protein sequences are more conserved than the genes encoding them, we explore the alternative route of taxonomic profiling based on protein coding regions translated from the shotgun metagenomics reads, instead of directly processing the DNA reads. We therefore developed the Unipept MetaGenomics Analysis Pipeline (UMGAP), a highly versatile suite of open source tools that are implemented in Rust and support parallelization to achieve optimal performance. Six preconfigured pipelines with different performance trade-offs were carefully selected, and benchmarked against a selection of state-of-the-art shotgun metagenomics taxonomic profiling tools. CONCLUSIONS: UMGAP’s protein space detour for taxonomic profiling makes it competitive with state-of-the-art shotgun metagenomics tools. Despite our design choices of an extra protein translation step, a broad spectrum index that can identify both archaea, bacteria, eukaryotes and viruses, and a highly configurable non-monolithic design, UMGAP achieves low runtime, manageable memory footprint and high accuracy. Its interactive visualizations allow for easy exploration and comparison of complex communities. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-022-08542-4). BioMed Central 2022-06-10 /pmc/articles/PMC9188040/ /pubmed/35689184 http://dx.doi.org/10.1186/s12864-022-08542-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Van der Jeugt, Felix
Maertens, Rien
Steyaert, Aranka
Verschaffelt, Pieter
De Tender, Caroline
Dawyndt, Peter
Mesuere, Bart
UMGAP: the Unipept MetaGenomics Analysis Pipeline
title UMGAP: the Unipept MetaGenomics Analysis Pipeline
title_full UMGAP: the Unipept MetaGenomics Analysis Pipeline
title_fullStr UMGAP: the Unipept MetaGenomics Analysis Pipeline
title_full_unstemmed UMGAP: the Unipept MetaGenomics Analysis Pipeline
title_short UMGAP: the Unipept MetaGenomics Analysis Pipeline
title_sort umgap: the unipept metagenomics analysis pipeline
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9188040/
https://www.ncbi.nlm.nih.gov/pubmed/35689184
http://dx.doi.org/10.1186/s12864-022-08542-4
work_keys_str_mv AT vanderjeugtfelix umgaptheunipeptmetagenomicsanalysispipeline
AT maertensrien umgaptheunipeptmetagenomicsanalysispipeline
AT steyaertaranka umgaptheunipeptmetagenomicsanalysispipeline
AT verschaffeltpieter umgaptheunipeptmetagenomicsanalysispipeline
AT detendercaroline umgaptheunipeptmetagenomicsanalysispipeline
AT dawyndtpeter umgaptheunipeptmetagenomicsanalysispipeline
AT mesuerebart umgaptheunipeptmetagenomicsanalysispipeline