Cargando…
annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing
BACKGROUND: The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms. R...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9724561/ https://www.ncbi.nlm.nih.gov/pubmed/36472574 http://dx.doi.org/10.1093/gigascience/giac099 |
_version_ | 1784844446234312704 |
---|---|
author | Farkas, Carlos Recabal, Antonia Mella, Andy Candia-Herrera, Daniel Olivero, Maryori González Haigh, Jody Jonathan Tarifeño-Saldivia, Estefanía Caprile, Teresa |
author_facet | Farkas, Carlos Recabal, Antonia Mella, Andy Candia-Herrera, Daniel Olivero, Maryori González Haigh, Jody Jonathan Tarifeño-Saldivia, Estefanía Caprile, Teresa |
author_sort | Farkas, Carlos |
collection | PubMed |
description | BACKGROUND: The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms. RESULTS: We developed an easy-to-use genome-guided transcriptome annotation pipeline that uses assembled transcripts from hybrid sequencing data as input and distinguishes between coding and long non-coding RNAs by integration of several bioinformatic approaches, including gene reconciliation with previous annotations in GTF format. We demonstrated the efficiency of this approach by correctly assembling and annotating all exons from the chicken SCO-spondin gene (containing more than 105 exons), including the identification of missing genes in the chicken reference annotations by homology assignments. CONCLUSIONS: Our method helps to improve the current transcriptome annotation of the chicken brain. Our pipeline, implemented on Anaconda/Nextflow and Docker is an easy-to-use package that can be applied to a broad range of species, tissues, and research areas helping to improve and reconcile current annotations. The code and datasets are publicly available at https://github.com/cfarkas/annotate_my_genomes |
format | Online Article Text |
id | pubmed-9724561 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-97245612022-12-07 annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing Farkas, Carlos Recabal, Antonia Mella, Andy Candia-Herrera, Daniel Olivero, Maryori González Haigh, Jody Jonathan Tarifeño-Saldivia, Estefanía Caprile, Teresa Gigascience Research BACKGROUND: The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms. RESULTS: We developed an easy-to-use genome-guided transcriptome annotation pipeline that uses assembled transcripts from hybrid sequencing data as input and distinguishes between coding and long non-coding RNAs by integration of several bioinformatic approaches, including gene reconciliation with previous annotations in GTF format. We demonstrated the efficiency of this approach by correctly assembling and annotating all exons from the chicken SCO-spondin gene (containing more than 105 exons), including the identification of missing genes in the chicken reference annotations by homology assignments. CONCLUSIONS: Our method helps to improve the current transcriptome annotation of the chicken brain. Our pipeline, implemented on Anaconda/Nextflow and Docker is an easy-to-use package that can be applied to a broad range of species, tissues, and research areas helping to improve and reconcile current annotations. The code and datasets are publicly available at https://github.com/cfarkas/annotate_my_genomes Oxford University Press 2022-12-06 /pmc/articles/PMC9724561/ /pubmed/36472574 http://dx.doi.org/10.1093/gigascience/giac099 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Farkas, Carlos Recabal, Antonia Mella, Andy Candia-Herrera, Daniel Olivero, Maryori González Haigh, Jody Jonathan Tarifeño-Saldivia, Estefanía Caprile, Teresa annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing |
title | annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing |
title_full | annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing |
title_fullStr | annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing |
title_full_unstemmed | annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing |
title_short | annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing |
title_sort | annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid rna sequencing |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9724561/ https://www.ncbi.nlm.nih.gov/pubmed/36472574 http://dx.doi.org/10.1093/gigascience/giac099 |
work_keys_str_mv | AT farkascarlos annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing AT recabalantonia annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing AT mellaandy annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing AT candiaherreradaniel annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing AT oliveromaryorigonzalez annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing AT haighjodyjonathan annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing AT tarifenosaldiviaestefania annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing AT caprileteresa annotatemygenomesaneasytousepipelinetoimprovegenomeannotationanduncoverneglectedgenesbyhybridrnasequencing |