Cargando…

TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data

Genome annotation plays a crucial role in providing comprehensive catalog of genes and transcripts for a particular species. As research projects generate new transcriptome data worldwide, integrating this information into existing annotations becomes essential. However, most bioinformatics pipeline...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kurylo, Cyril, Guyomar, Cervin, Foissac, Sylvain, Djebali, Sarah
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Standard Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10578202/ https://www.ncbi.nlm.nih.gov/pubmed/37850035 http://dx.doi.org/10.1093/nargab/lqad089

_version_	1785121468463448064
author	Kurylo, Cyril Guyomar, Cervin Foissac, Sylvain Djebali, Sarah
author_facet	Kurylo, Cyril Guyomar, Cervin Foissac, Sylvain Djebali, Sarah
author_sort	Kurylo, Cyril
collection	PubMed
description	Genome annotation plays a crucial role in providing comprehensive catalog of genes and transcripts for a particular species. As research projects generate new transcriptome data worldwide, integrating this information into existing annotations becomes essential. However, most bioinformatics pipelines are limited in their ability to effectively and consistently update annotations using new RNA-seq data. Here we introduce TAGADA, an RNA-seq pipeline for Transcripts And Genes Assembly, Deconvolution, and Analysis. Given a genomic sequence, a reference annotation and RNA-seq reads, TAGADA enhances existing gene models by generating an improved annotation. It also computes expression values for both the reference and novel annotation, identifies long non-coding transcripts (lncRNAs), and provides a comprehensive quality control report. Developed using Nextflow DSL2, TAGADA offers user-friendly functionalities and ensures reproducibility across different computing platforms through its containerized environment. In this study, we demonstrate the efficacy of TAGADA using RNA-seq data from the GENE-SWiTCH project alongside chicken and pig genome annotations as references. Results indicate that TAGADA can substantially increase the number of annotated transcripts by approximately [Formula: see text] in these species. Furthermore, we illustrate how TAGADA can integrate Illumina NovaSeq short reads with PacBio Iso-Seq long reads, showcasing its versatility. TAGADA is available at github.com/FAANG/analysis-TAGADA.
format	Online Article Text
id	pubmed-10578202
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-105782022023-10-17 TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data Kurylo, Cyril Guyomar, Cervin Foissac, Sylvain Djebali, Sarah NAR Genom Bioinform Standard Article Genome annotation plays a crucial role in providing comprehensive catalog of genes and transcripts for a particular species. As research projects generate new transcriptome data worldwide, integrating this information into existing annotations becomes essential. However, most bioinformatics pipelines are limited in their ability to effectively and consistently update annotations using new RNA-seq data. Here we introduce TAGADA, an RNA-seq pipeline for Transcripts And Genes Assembly, Deconvolution, and Analysis. Given a genomic sequence, a reference annotation and RNA-seq reads, TAGADA enhances existing gene models by generating an improved annotation. It also computes expression values for both the reference and novel annotation, identifies long non-coding transcripts (lncRNAs), and provides a comprehensive quality control report. Developed using Nextflow DSL2, TAGADA offers user-friendly functionalities and ensures reproducibility across different computing platforms through its containerized environment. In this study, we demonstrate the efficacy of TAGADA using RNA-seq data from the GENE-SWiTCH project alongside chicken and pig genome annotations as references. Results indicate that TAGADA can substantially increase the number of annotated transcripts by approximately [Formula: see text] in these species. Furthermore, we illustrate how TAGADA can integrate Illumina NovaSeq short reads with PacBio Iso-Seq long reads, showcasing its versatility. TAGADA is available at github.com/FAANG/analysis-TAGADA. Oxford University Press 2023-10-16 /pmc/articles/PMC10578202/ /pubmed/37850035 http://dx.doi.org/10.1093/nargab/lqad089 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Standard Article Kurylo, Cyril Guyomar, Cervin Foissac, Sylvain Djebali, Sarah TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data
title	TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data
title_full	TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data
title_fullStr	TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data
title_full_unstemmed	TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data
title_short	TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data
title_sort	tagada: a scalable pipeline to improve genome annotations with rna-seq data
topic	Standard Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10578202/ https://www.ncbi.nlm.nih.gov/pubmed/37850035 http://dx.doi.org/10.1093/nargab/lqad089
work_keys_str_mv	AT kurylocyril tagadaascalablepipelinetoimprovegenomeannotationswithrnaseqdata AT guyomarcervin tagadaascalablepipelinetoimprovegenomeannotationswithrnaseqdata AT foissacsylvain tagadaascalablepipelinetoimprovegenomeannotationswithrnaseqdata AT djebalisarah tagadaascalablepipelinetoimprovegenomeannotationswithrnaseqdata

TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data

Ejemplares similares