Cargando…

annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing

BACKGROUND: The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms. R...

Descripción completa

Detalles Bibliográficos
Autores principales: Farkas, Carlos, Recabal, Antonia, Mella, Andy, Candia-Herrera, Daniel, Olivero, Maryori González, Haigh, Jody Jonathan, Tarifeño-Saldivia, Estefanía, Caprile, Teresa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9724561/
https://www.ncbi.nlm.nih.gov/pubmed/36472574
http://dx.doi.org/10.1093/gigascience/giac099
_version_ 1784844446234312704
author Farkas, Carlos
Recabal, Antonia
Mella, Andy
Candia-Herrera, Daniel
Olivero, Maryori González
Haigh, Jody Jonathan
Tarifeño-Saldivia, Estefanía
Caprile, Teresa
author_facet Farkas, Carlos
Recabal, Antonia
Mella, Andy
Candia-Herrera, Daniel
Olivero, Maryori González
Haigh, Jody Jonathan
Tarifeño-Saldivia, Estefanía
Caprile, Teresa
author_sort Farkas, Carlos
collection PubMed
description BACKGROUND: The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms. RESULTS: We developed an easy-to-use genome-guided transcriptome annotation pipeline that uses assembled transcripts from hybrid sequencing data as input and distinguishes between coding and long non-coding RNAs by integration of several bioinformatic approaches, including gene reconciliation with previous annotations in GTF format. We demonstrated the efficiency of this approach by correctly assembling and annotating all exons from the chicken SCO-spondin gene (containing more than 105 exons), including the identification of missing genes in the chicken reference annotations by homology assignments. CONCLUSIONS: Our method helps to improve the current transcriptome annotation of the chicken brain. Our pipeline, implemented on Anaconda/Nextflow and Docker is an easy-to-use package that can be applied to a broad range of species, tissues, and research areas helping to improve and reconcile current annotations. The code and datasets are publicly available at https://github.com/cfarkas/annotate_my_genomes
format Online
Article
Text
id pubmed-9724561
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97245612022-12-07 annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing Farkas, Carlos Recabal, Antonia Mella, Andy Candia-Herrera, Daniel Olivero, Maryori González Haigh, Jody Jonathan Tarifeño-Saldivia, Estefanía Caprile, Teresa Gigascience Research BACKGROUND: The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms. RESULTS: We developed an easy-to-use genome-guided transcriptome annotation pipeline that uses assembled transcripts from hybrid sequencing data as input and distinguishes between coding and long non-coding RNAs by integration of several bioinformatic approaches, including gene reconciliation with previous annotations in GTF format. We demonstrated the efficiency of this approach by correctly assembling and annotating all exons from the chicken SCO-spondin gene (containing more than 105 exons), including the identification of missing genes in the chicken reference annotations by homology assignments. CONCLUSIONS: Our method helps to improve the current transcriptome annotation of the chicken brain. Our pipeline, implemented on Anaconda/Nextflow and Docker is an easy-to-use package that can be applied to a broad range of species, tissues, and research areas helping to improve and reconcile current annotations. The code and datasets are publicly available at https://github.com/cfarkas/annotate_my_genomes Oxford University Press 2022-12-06 /pmc/articles/PMC9724561/ /pubmed/36472574 http://dx.doi.org/10.1093/gigascience/giac099 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Farkas, Carlos
Recabal, Antonia
Mella, Andy
Candia-Herrera, Daniel
Olivero, Maryori González
Haigh, Jody Jonathan
Tarifeño-Saldivia, Estefanía
Caprile, Teresa
annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing
title annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing
title_full annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing
title_fullStr annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing
title_full_unstemmed annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing
title_short annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing
title_sort annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid rna sequencing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9724561/
https://www.ncbi.nlm.nih.gov/pubmed/36472574
http://dx.doi.org/10.1093/gigascience/giac099
work_keys_str_mv AT farkascarlos annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing
AT recabalantonia annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing
AT mellaandy annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing
AT candiaherreradaniel annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing
AT oliveromaryorigonzalez annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing
AT haighjodyjonathan annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing
AT tarifenosaldiviaestefania annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing
AT caprileteresa annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing