Cargando…

Incorporating RNA-seq data into the zebrafish Ensembl genebuild

Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific com...

Descripción completa

Detalles Bibliográficos
Autores principales: Collins, John E., White, Simon, Searle, Stephen M.J., Stemple, Derek L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3460200/
https://www.ncbi.nlm.nih.gov/pubmed/22798491
http://dx.doi.org/10.1101/gr.137901.112
_version_ 1782244924392996864
author Collins, John E.
White, Simon
Searle, Stephen M.J.
Stemple, Derek L.
author_facet Collins, John E.
White, Simon
Searle, Stephen M.J.
Stemple, Derek L.
author_sort Collins, John E.
collection PubMed
description Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific component that can be cost-effectively achieved using RNA-seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3′-end capture and sequencing protocol was developed to predict the 3′ ends of transcripts, and 46.1% of the original models were subsequently refined. Secondly, a standard Ensembl genebuild, incorporating carefully filtered elements from the RNA-seq-only build, followed by a merge with the manually curated VEGA database, produced a comprehensive annotation of 26,152 genes represented by 51,569 transcripts. The RNA-seq-only and the Ensembl/VEGA genebuilds contribute contrasting elements to the final genebuild. The RNA-seq genebuild was used to adjust intron/exon boundaries of orthologous defined models, confirm their expression, and improve 3′ untranslated regions. Importantly, the inferred protein alignments within the Ensembl genebuild conferred proof of model contiguity for the RNA-seq models. The zebrafish gene annotation has been enhanced by the incorporation of RNA-seq data and the pipeline will be used for other organisms. Organisms with little species-specific cDNA data will generally benefit the most.
format Online
Article
Text
id pubmed-3460200
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-34602002012-10-06 Incorporating RNA-seq data into the zebrafish Ensembl genebuild Collins, John E. White, Simon Searle, Stephen M.J. Stemple, Derek L. Genome Res Resource Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific component that can be cost-effectively achieved using RNA-seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3′-end capture and sequencing protocol was developed to predict the 3′ ends of transcripts, and 46.1% of the original models were subsequently refined. Secondly, a standard Ensembl genebuild, incorporating carefully filtered elements from the RNA-seq-only build, followed by a merge with the manually curated VEGA database, produced a comprehensive annotation of 26,152 genes represented by 51,569 transcripts. The RNA-seq-only and the Ensembl/VEGA genebuilds contribute contrasting elements to the final genebuild. The RNA-seq genebuild was used to adjust intron/exon boundaries of orthologous defined models, confirm their expression, and improve 3′ untranslated regions. Importantly, the inferred protein alignments within the Ensembl genebuild conferred proof of model contiguity for the RNA-seq models. The zebrafish gene annotation has been enhanced by the incorporation of RNA-seq data and the pipeline will be used for other organisms. Organisms with little species-specific cDNA data will generally benefit the most. Cold Spring Harbor Laboratory Press 2012-10 /pmc/articles/PMC3460200/ /pubmed/22798491 http://dx.doi.org/10.1101/gr.137901.112 Text en © 2012, Published by Cold Spring Harbor Laboratory Press This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.
spellingShingle Resource
Collins, John E.
White, Simon
Searle, Stephen M.J.
Stemple, Derek L.
Incorporating RNA-seq data into the zebrafish Ensembl genebuild
title Incorporating RNA-seq data into the zebrafish Ensembl genebuild
title_full Incorporating RNA-seq data into the zebrafish Ensembl genebuild
title_fullStr Incorporating RNA-seq data into the zebrafish Ensembl genebuild
title_full_unstemmed Incorporating RNA-seq data into the zebrafish Ensembl genebuild
title_short Incorporating RNA-seq data into the zebrafish Ensembl genebuild
title_sort incorporating rna-seq data into the zebrafish ensembl genebuild
topic Resource
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3460200/
https://www.ncbi.nlm.nih.gov/pubmed/22798491
http://dx.doi.org/10.1101/gr.137901.112
work_keys_str_mv AT collinsjohne incorporatingrnaseqdataintothezebrafishensemblgenebuild
AT whitesimon incorporatingrnaseqdataintothezebrafishensemblgenebuild
AT searlestephenmj incorporatingrnaseqdataintothezebrafishensemblgenebuild
AT stemplederekl incorporatingrnaseqdataintothezebrafishensemblgenebuild