Cargando…

Mantis: flexible and consensus-driven genome annotation

BACKGROUND: The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, p...

Descripción completa

Detalles Bibliográficos
Autores principales: Queirós, Pedro, Delogu, Francesco, Hickl, Oskar, May, Patrick, Wilmes, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8170692/
https://www.ncbi.nlm.nih.gov/pubmed/34076241
http://dx.doi.org/10.1093/gigascience/giab042
_version_ 1783702295927586816
author Queirós, Pedro
Delogu, Francesco
Hickl, Oskar
May, Patrick
Wilmes, Paul
author_facet Queirós, Pedro
Delogu, Francesco
Hickl, Oskar
May, Patrick
Wilmes, Paul
author_sort Queirós, Pedro
collection PubMed
description BACKGROUND: The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation can be described as the identification of regions of interest (i.e., domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, challenges remain in terms of speed, flexibility, and reproducibility. In the big data era, it is also increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, and thus overcoming some limitations in overly relying on computationally generated data from single sources. RESULTS: We implemented a protein annotation tool, Mantis, which uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for the customization of reference data and execution parameters, and is reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which significantly improved annotation performance compared to sequence-wide annotation. The parallelized implementation of Mantis results in short runtimes while also outputting high coverage and high-quality protein function annotations. CONCLUSIONS: Mantis is a protein function annotation tool that produces high-quality consensus-driven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes. Mantis is available under the MIT license at https://github.com/PedroMTQ/mantis.
format Online
Article
Text
id pubmed-8170692
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-81706922021-06-02 Mantis: flexible and consensus-driven genome annotation Queirós, Pedro Delogu, Francesco Hickl, Oskar May, Patrick Wilmes, Paul Gigascience Technical Note BACKGROUND: The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation can be described as the identification of regions of interest (i.e., domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, challenges remain in terms of speed, flexibility, and reproducibility. In the big data era, it is also increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, and thus overcoming some limitations in overly relying on computationally generated data from single sources. RESULTS: We implemented a protein annotation tool, Mantis, which uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for the customization of reference data and execution parameters, and is reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which significantly improved annotation performance compared to sequence-wide annotation. The parallelized implementation of Mantis results in short runtimes while also outputting high coverage and high-quality protein function annotations. CONCLUSIONS: Mantis is a protein function annotation tool that produces high-quality consensus-driven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes. Mantis is available under the MIT license at https://github.com/PedroMTQ/mantis. Oxford University Press 2021-06-02 /pmc/articles/PMC8170692/ /pubmed/34076241 http://dx.doi.org/10.1093/gigascience/giab042 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Queirós, Pedro
Delogu, Francesco
Hickl, Oskar
May, Patrick
Wilmes, Paul
Mantis: flexible and consensus-driven genome annotation
title Mantis: flexible and consensus-driven genome annotation
title_full Mantis: flexible and consensus-driven genome annotation
title_fullStr Mantis: flexible and consensus-driven genome annotation
title_full_unstemmed Mantis: flexible and consensus-driven genome annotation
title_short Mantis: flexible and consensus-driven genome annotation
title_sort mantis: flexible and consensus-driven genome annotation
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8170692/
https://www.ncbi.nlm.nih.gov/pubmed/34076241
http://dx.doi.org/10.1093/gigascience/giab042
work_keys_str_mv AT queirospedro mantisflexibleandconsensusdrivengenomeannotation
AT delogufrancesco mantisflexibleandconsensusdrivengenomeannotation
AT hickloskar mantisflexibleandconsensusdrivengenomeannotation
AT maypatrick mantisflexibleandconsensusdrivengenomeannotation
AT wilmespaul mantisflexibleandconsensusdrivengenomeannotation