Cargando…
StandEnA: a customizable workflow for standardized annotation and generating a presence–absence matrix of proteins
MOTIVATION: Several genome annotation tools standardize annotation outputs for comparability. During standardization, these tools do not allow user-friendly customization of annotation databases; limiting their flexibility and applicability in downstream analysis. RESULTS: StandEnA is a user-friendl...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10336186/ https://www.ncbi.nlm.nih.gov/pubmed/37448812 http://dx.doi.org/10.1093/bioadv/vbad069 |
Sumario: | MOTIVATION: Several genome annotation tools standardize annotation outputs for comparability. During standardization, these tools do not allow user-friendly customization of annotation databases; limiting their flexibility and applicability in downstream analysis. RESULTS: StandEnA is a user-friendly command-line tool for Linux that facilitates the generation of custom databases by retrieving protein sequences from multiple databases. Directed by a user-defined list of standard names, StandEnA retrieves synonyms to search for corresponding sequences in a set of public databases. Custom databases are used in prokaryotic genome annotation to generate standardized presence–absence matrices and reference files containing standard database identifiers. To showcase StandEnA, we applied it to six metagenome-assembled genomes to analyze three different pathways. AVAILABILITY AND IMPLEMENTATION: StandEnA is an open-source software available at https://github.com/mdsufz/StandEnA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. |
---|