Cargando…
AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data
Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experi...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723129/ https://www.ncbi.nlm.nih.gov/pubmed/36482896 http://dx.doi.org/10.3389/fgene.2022.1020100 |
_version_ | 1784844096662142976 |
---|---|
author | Maia, Guilherme Augusto Filho, Vilmar Benetti Kawagoe, Eric Kazuo Teixeira Soratto, Tatiany Aparecida Moreira, Renato Simões Grisard, Edmundo Carlos Wagner, Glauber |
author_facet | Maia, Guilherme Augusto Filho, Vilmar Benetti Kawagoe, Eric Kazuo Teixeira Soratto, Tatiany Aparecida Moreira, Renato Simões Grisard, Edmundo Carlos Wagner, Glauber |
author_sort | Maia, Guilherme Augusto |
collection | PubMed |
description | Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experimental data towards validation of in silico prediction of gene function is of utmost relevance. Here, we present a computational workflow named AnnotaPipeline that integrates distinct software and data types on a proteogenomic approach to annotate and validate predicted features in genomic sequences. Based on FASTA (i) nucleotide or (ii) protein sequences or (iii) structural annotation files (GFF3), users can input FASTQ RNA-seq data, MS/MS data from mzXML or similar formats, as the pipeline uses both transcriptomic and proteomic information to corroborate annotations and validate gene prediction, providing transcription and expression evidence for functional annotation. Reannotation of the available Arabidopsis thaliana, Caenorhabditis elegans, Candida albicans, Trypanosoma cruzi, and Trypanosoma rangeli genomes was performed using the AnnotaPipeline, resulting in a higher proportion of annotated proteins and a reduced proportion of hypothetical proteins when compared to the annotations publicly available for these organisms. AnnotaPipeline is a Unix-based pipeline developed using Python and is available at: https://github.com/bioinformatics-ufsc/AnnotaPipeline. |
format | Online Article Text |
id | pubmed-9723129 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-97231292022-12-07 AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data Maia, Guilherme Augusto Filho, Vilmar Benetti Kawagoe, Eric Kazuo Teixeira Soratto, Tatiany Aparecida Moreira, Renato Simões Grisard, Edmundo Carlos Wagner, Glauber Front Genet Genetics Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experimental data towards validation of in silico prediction of gene function is of utmost relevance. Here, we present a computational workflow named AnnotaPipeline that integrates distinct software and data types on a proteogenomic approach to annotate and validate predicted features in genomic sequences. Based on FASTA (i) nucleotide or (ii) protein sequences or (iii) structural annotation files (GFF3), users can input FASTQ RNA-seq data, MS/MS data from mzXML or similar formats, as the pipeline uses both transcriptomic and proteomic information to corroborate annotations and validate gene prediction, providing transcription and expression evidence for functional annotation. Reannotation of the available Arabidopsis thaliana, Caenorhabditis elegans, Candida albicans, Trypanosoma cruzi, and Trypanosoma rangeli genomes was performed using the AnnotaPipeline, resulting in a higher proportion of annotated proteins and a reduced proportion of hypothetical proteins when compared to the annotations publicly available for these organisms. AnnotaPipeline is a Unix-based pipeline developed using Python and is available at: https://github.com/bioinformatics-ufsc/AnnotaPipeline. Frontiers Media S.A. 2022-11-22 /pmc/articles/PMC9723129/ /pubmed/36482896 http://dx.doi.org/10.3389/fgene.2022.1020100 Text en Copyright © 2022 Maia, Filho, Kawagoe, Teixeira Soratto, Moreira, Grisard and Wagner. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Maia, Guilherme Augusto Filho, Vilmar Benetti Kawagoe, Eric Kazuo Teixeira Soratto, Tatiany Aparecida Moreira, Renato Simões Grisard, Edmundo Carlos Wagner, Glauber AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data |
title | AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data |
title_full | AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data |
title_fullStr | AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data |
title_full_unstemmed | AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data |
title_short | AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data |
title_sort | annotapipeline: an integrated tool to annotate eukaryotic proteins using multi-omics data |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723129/ https://www.ncbi.nlm.nih.gov/pubmed/36482896 http://dx.doi.org/10.3389/fgene.2022.1020100 |
work_keys_str_mv | AT maiaguilhermeaugusto annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata AT filhovilmarbenetti annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata AT kawagoeerickazuo annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata AT teixeirasorattotatianyaparecida annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata AT moreirarenatosimoes annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata AT grisardedmundocarlos annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata AT wagnerglauber annotapipelineanintegratedtooltoannotateeukaryoticproteinsusingmultiomicsdata |