Cargando…

StandEnA: a customizable workflow for standardized annotation and generating a presence–absence matrix of proteins

MOTIVATION: Several genome annotation tools standardize annotation outputs for comparability. During standardization, these tools do not allow user-friendly customization of annotation databases; limiting their flexibility and applicability in downstream analysis. RESULTS: StandEnA is a user-friendl...

Descripción completa

Detalles Bibliográficos
Autores principales: Chafra, Fatma, Borim Correa, Felipe, Oni, Faith, Konu Karakayalı, Özlen, Stadler, Peter F, Nunes da Rocha, Ulisses
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10336186/
https://www.ncbi.nlm.nih.gov/pubmed/37448812
http://dx.doi.org/10.1093/bioadv/vbad069
_version_ 1785071153975394304
author Chafra, Fatma
Borim Correa, Felipe
Oni, Faith
Konu Karakayalı, Özlen
Stadler, Peter F
Nunes da Rocha, Ulisses
author_facet Chafra, Fatma
Borim Correa, Felipe
Oni, Faith
Konu Karakayalı, Özlen
Stadler, Peter F
Nunes da Rocha, Ulisses
author_sort Chafra, Fatma
collection PubMed
description MOTIVATION: Several genome annotation tools standardize annotation outputs for comparability. During standardization, these tools do not allow user-friendly customization of annotation databases; limiting their flexibility and applicability in downstream analysis. RESULTS: StandEnA is a user-friendly command-line tool for Linux that facilitates the generation of custom databases by retrieving protein sequences from multiple databases. Directed by a user-defined list of standard names, StandEnA retrieves synonyms to search for corresponding sequences in a set of public databases. Custom databases are used in prokaryotic genome annotation to generate standardized presence–absence matrices and reference files containing standard database identifiers. To showcase StandEnA, we applied it to six metagenome-assembled genomes to analyze three different pathways. AVAILABILITY AND IMPLEMENTATION: StandEnA is an open-source software available at https://github.com/mdsufz/StandEnA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-10336186
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103361862023-07-13 StandEnA: a customizable workflow for standardized annotation and generating a presence–absence matrix of proteins Chafra, Fatma Borim Correa, Felipe Oni, Faith Konu Karakayalı, Özlen Stadler, Peter F Nunes da Rocha, Ulisses Bioinform Adv Application Note MOTIVATION: Several genome annotation tools standardize annotation outputs for comparability. During standardization, these tools do not allow user-friendly customization of annotation databases; limiting their flexibility and applicability in downstream analysis. RESULTS: StandEnA is a user-friendly command-line tool for Linux that facilitates the generation of custom databases by retrieving protein sequences from multiple databases. Directed by a user-defined list of standard names, StandEnA retrieves synonyms to search for corresponding sequences in a set of public databases. Custom databases are used in prokaryotic genome annotation to generate standardized presence–absence matrices and reference files containing standard database identifiers. To showcase StandEnA, we applied it to six metagenome-assembled genomes to analyze three different pathways. AVAILABILITY AND IMPLEMENTATION: StandEnA is an open-source software available at https://github.com/mdsufz/StandEnA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2023-06-09 /pmc/articles/PMC10336186/ /pubmed/37448812 http://dx.doi.org/10.1093/bioadv/vbad069 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Application Note
Chafra, Fatma
Borim Correa, Felipe
Oni, Faith
Konu Karakayalı, Özlen
Stadler, Peter F
Nunes da Rocha, Ulisses
StandEnA: a customizable workflow for standardized annotation and generating a presence–absence matrix of proteins
title StandEnA: a customizable workflow for standardized annotation and generating a presence–absence matrix of proteins
title_full StandEnA: a customizable workflow for standardized annotation and generating a presence–absence matrix of proteins
title_fullStr StandEnA: a customizable workflow for standardized annotation and generating a presence–absence matrix of proteins
title_full_unstemmed StandEnA: a customizable workflow for standardized annotation and generating a presence–absence matrix of proteins
title_short StandEnA: a customizable workflow for standardized annotation and generating a presence–absence matrix of proteins
title_sort standena: a customizable workflow for standardized annotation and generating a presence–absence matrix of proteins
topic Application Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10336186/
https://www.ncbi.nlm.nih.gov/pubmed/37448812
http://dx.doi.org/10.1093/bioadv/vbad069
work_keys_str_mv AT chafrafatma standenaacustomizableworkflowforstandardizedannotationandgeneratingapresenceabsencematrixofproteins
AT borimcorreafelipe standenaacustomizableworkflowforstandardizedannotationandgeneratingapresenceabsencematrixofproteins
AT onifaith standenaacustomizableworkflowforstandardizedannotationandgeneratingapresenceabsencematrixofproteins
AT konukarakayalıozlen standenaacustomizableworkflowforstandardizedannotationandgeneratingapresenceabsencematrixofproteins
AT stadlerpeterf standenaacustomizableworkflowforstandardizedannotationandgeneratingapresenceabsencematrixofproteins
AT nunesdarochaulisses standenaacustomizableworkflowforstandardizedannotationandgeneratingapresenceabsencematrixofproteins