Cargando…

AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data

Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experi...

Descripción completa

Detalles Bibliográficos
Autores principales: Maia, Guilherme Augusto, Filho, Vilmar Benetti, Kawagoe, Eric Kazuo, Teixeira Soratto, Tatiany Aparecida, Moreira, Renato Simões, Grisard, Edmundo Carlos, Wagner, Glauber
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723129/
https://www.ncbi.nlm.nih.gov/pubmed/36482896
http://dx.doi.org/10.3389/fgene.2022.1020100
_version_ 1784844096662142976
author Maia, Guilherme Augusto
Filho, Vilmar Benetti
Kawagoe, Eric Kazuo
Teixeira Soratto, Tatiany Aparecida
Moreira, Renato Simões
Grisard, Edmundo Carlos
Wagner, Glauber
author_facet Maia, Guilherme Augusto
Filho, Vilmar Benetti
Kawagoe, Eric Kazuo
Teixeira Soratto, Tatiany Aparecida
Moreira, Renato Simões
Grisard, Edmundo Carlos
Wagner, Glauber
author_sort Maia, Guilherme Augusto
collection PubMed
description Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experimental data towards validation of in silico prediction of gene function is of utmost relevance. Here, we present a computational workflow named AnnotaPipeline that integrates distinct software and data types on a proteogenomic approach to annotate and validate predicted features in genomic sequences. Based on FASTA (i) nucleotide or (ii) protein sequences or (iii) structural annotation files (GFF3), users can input FASTQ RNA-seq data, MS/MS data from mzXML or similar formats, as the pipeline uses both transcriptomic and proteomic information to corroborate annotations and validate gene prediction, providing transcription and expression evidence for functional annotation. Reannotation of the available Arabidopsis thaliana, Caenorhabditis elegans, Candida albicans, Trypanosoma cruzi, and Trypanosoma rangeli genomes was performed using the AnnotaPipeline, resulting in a higher proportion of annotated proteins and a reduced proportion of hypothetical proteins when compared to the annotations publicly available for these organisms. AnnotaPipeline is a Unix-based pipeline developed using Python and is available at: https://github.com/bioinformatics-ufsc/AnnotaPipeline.
format Online
Article
Text
id pubmed-9723129
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-97231292022-12-07 AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data Maia, Guilherme Augusto Filho, Vilmar Benetti Kawagoe, Eric Kazuo Teixeira Soratto, Tatiany Aparecida Moreira, Renato Simões Grisard, Edmundo Carlos Wagner, Glauber Front Genet Genetics Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experimental data towards validation of in silico prediction of gene function is of utmost relevance. Here, we present a computational workflow named AnnotaPipeline that integrates distinct software and data types on a proteogenomic approach to annotate and validate predicted features in genomic sequences. Based on FASTA (i) nucleotide or (ii) protein sequences or (iii) structural annotation files (GFF3), users can input FASTQ RNA-seq data, MS/MS data from mzXML or similar formats, as the pipeline uses both transcriptomic and proteomic information to corroborate annotations and validate gene prediction, providing transcription and expression evidence for functional annotation. Reannotation of the available Arabidopsis thaliana, Caenorhabditis elegans, Candida albicans, Trypanosoma cruzi, and Trypanosoma rangeli genomes was performed using the AnnotaPipeline, resulting in a higher proportion of annotated proteins and a reduced proportion of hypothetical proteins when compared to the annotations publicly available for these organisms. AnnotaPipeline is a Unix-based pipeline developed using Python and is available at: https://github.com/bioinformatics-ufsc/AnnotaPipeline. Frontiers Media S.A. 2022-11-22 /pmc/articles/PMC9723129/ /pubmed/36482896 http://dx.doi.org/10.3389/fgene.2022.1020100 Text en Copyright © 2022 Maia, Filho, Kawagoe, Teixeira Soratto, Moreira, Grisard and Wagner. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Maia, Guilherme Augusto
Filho, Vilmar Benetti
Kawagoe, Eric Kazuo
Teixeira Soratto, Tatiany Aparecida
Moreira, Renato Simões
Grisard, Edmundo Carlos
Wagner, Glauber
AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data
title AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data
title_full AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data
title_fullStr AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data
title_full_unstemmed AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data
title_short AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data
title_sort annotapipeline: an integrated tool to annotate eukaryotic proteins using multi-omics data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723129/
https://www.ncbi.nlm.nih.gov/pubmed/36482896
http://dx.doi.org/10.3389/fgene.2022.1020100
work_keys_str_mv AT maiaguilhermeaugusto annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata
AT filhovilmarbenetti annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata
AT kawagoeerickazuo annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata
AT teixeirasorattotatianyaparecida annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata
AT moreirarenatosimoes annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata
AT grisardedmundocarlos annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata
AT wagnerglauber annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata