Cargando…

MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes

BACKGROUND: High-throughput sequencing has increased the number of available microbial genomes recovered from isolates, single cells, and metagenomes. Accordingly, fast and comprehensive functional gene annotation pipelines are needed to analyze and compare these genomes. Although several approaches...

Descripción completa

Detalles Bibliográficos
Autores principales: Ruiz-Perez, Carlos A., Conrad, Roth E., Konstantinidis, Konstantinos T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7789693/
https://www.ncbi.nlm.nih.gov/pubmed/33407081
http://dx.doi.org/10.1186/s12859-020-03940-5
_version_ 1783633296251420672
author Ruiz-Perez, Carlos A.
Conrad, Roth E.
Konstantinidis, Konstantinos T.
author_facet Ruiz-Perez, Carlos A.
Conrad, Roth E.
Konstantinidis, Konstantinos T.
author_sort Ruiz-Perez, Carlos A.
collection PubMed
description BACKGROUND: High-throughput sequencing has increased the number of available microbial genomes recovered from isolates, single cells, and metagenomes. Accordingly, fast and comprehensive functional gene annotation pipelines are needed to analyze and compare these genomes. Although several approaches exist for genome annotation, these are typically not designed for easy incorporation into analysis pipelines, do not combine results from different annotation databases or offer easy-to-use summaries of metabolic reconstructions, and typically require large amounts of computing power for high-throughput analysis not available to the average user. RESULTS: Here, we introduce MicrobeAnnotator, a fully automated, easy-to-use pipeline for the comprehensive functional annotation of microbial genomes that combines results from several reference protein databases and returns the matching annotations together with key metadata such as the interlinked identifiers of matching reference proteins from multiple databases [KEGG Orthology (KO), Enzyme Commission (E.C.), Gene Ontology (GO), Pfam, and InterPro]. Further, the functional annotations are summarized into Kyoto Encyclopedia of Genes and Genomes (KEGG) modules as part of a graphical output (heatmap) that allows the user to quickly detect differences among (multiple) query genomes and cluster the genomes based on their metabolic similarity. MicrobeAnnotator is implemented in Python 3 and is freely available under an open-source Artistic License 2.0 from https://github.com/cruizperez/MicrobeAnnotator. CONCLUSIONS: We demonstrated the capabilities of MicrobeAnnotator by annotating 100 Escherichia coli and 78 environmental Candidate Phyla Radiation (CPR) bacterial genomes and comparing the results to those of other popular tools. We showed that the use of multiple annotation databases allows MicrobeAnnotator to recover more annotations per genome compared to faster tools that use reduced databases and is computationally efficient for use in personal computers. The output of MicrobeAnnotator can be easily incorporated into other analysis pipelines while the results of other annotation tools can be seemingly incorporated into MicrobeAnnotator to generate summary plots.
format Online
Article
Text
id pubmed-7789693
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77896932021-01-07 MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes Ruiz-Perez, Carlos A. Conrad, Roth E. Konstantinidis, Konstantinos T. BMC Bioinformatics Software BACKGROUND: High-throughput sequencing has increased the number of available microbial genomes recovered from isolates, single cells, and metagenomes. Accordingly, fast and comprehensive functional gene annotation pipelines are needed to analyze and compare these genomes. Although several approaches exist for genome annotation, these are typically not designed for easy incorporation into analysis pipelines, do not combine results from different annotation databases or offer easy-to-use summaries of metabolic reconstructions, and typically require large amounts of computing power for high-throughput analysis not available to the average user. RESULTS: Here, we introduce MicrobeAnnotator, a fully automated, easy-to-use pipeline for the comprehensive functional annotation of microbial genomes that combines results from several reference protein databases and returns the matching annotations together with key metadata such as the interlinked identifiers of matching reference proteins from multiple databases [KEGG Orthology (KO), Enzyme Commission (E.C.), Gene Ontology (GO), Pfam, and InterPro]. Further, the functional annotations are summarized into Kyoto Encyclopedia of Genes and Genomes (KEGG) modules as part of a graphical output (heatmap) that allows the user to quickly detect differences among (multiple) query genomes and cluster the genomes based on their metabolic similarity. MicrobeAnnotator is implemented in Python 3 and is freely available under an open-source Artistic License 2.0 from https://github.com/cruizperez/MicrobeAnnotator. CONCLUSIONS: We demonstrated the capabilities of MicrobeAnnotator by annotating 100 Escherichia coli and 78 environmental Candidate Phyla Radiation (CPR) bacterial genomes and comparing the results to those of other popular tools. We showed that the use of multiple annotation databases allows MicrobeAnnotator to recover more annotations per genome compared to faster tools that use reduced databases and is computationally efficient for use in personal computers. The output of MicrobeAnnotator can be easily incorporated into other analysis pipelines while the results of other annotation tools can be seemingly incorporated into MicrobeAnnotator to generate summary plots. BioMed Central 2021-01-06 /pmc/articles/PMC7789693/ /pubmed/33407081 http://dx.doi.org/10.1186/s12859-020-03940-5 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Ruiz-Perez, Carlos A.
Conrad, Roth E.
Konstantinidis, Konstantinos T.
MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes
title MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes
title_full MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes
title_fullStr MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes
title_full_unstemmed MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes
title_short MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes
title_sort microbeannotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7789693/
https://www.ncbi.nlm.nih.gov/pubmed/33407081
http://dx.doi.org/10.1186/s12859-020-03940-5
work_keys_str_mv AT ruizperezcarlosa microbeannotatorauserfriendlycomprehensivefunctionalannotationpipelineformicrobialgenomes
AT conradrothe microbeannotatorauserfriendlycomprehensivefunctionalannotationpipelineformicrobialgenomes
AT konstantinidiskonstantinost microbeannotatorauserfriendlycomprehensivefunctionalannotationpipelineformicrobialgenomes