Cargando…

Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome

Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematica...

Descripción completa

Detalles Bibliográficos
Autores principales: Howald, Cédric, Tanzer, Andrea, Chrast, Jacqueline, Kokocinski, Felix, Derrien, Thomas, Walters, Nathalie, Gonzalez, Jose M., Frankish, Adam, Aken, Bronwen L., Hourlier, Thibaut, Vogel, Jan-Hinnerk, White, Simon, Searle, Stephen, Harrow, Jennifer, Hubbard, Tim J., Guigó, Roderic, Reymond, Alexandre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431487/
https://www.ncbi.nlm.nih.gov/pubmed/22955982
http://dx.doi.org/10.1101/gr.134478.111
_version_ 1782242091141693440
author Howald, Cédric
Tanzer, Andrea
Chrast, Jacqueline
Kokocinski, Felix
Derrien, Thomas
Walters, Nathalie
Gonzalez, Jose M.
Frankish, Adam
Aken, Bronwen L.
Hourlier, Thibaut
Vogel, Jan-Hinnerk
White, Simon
Searle, Stephen
Harrow, Jennifer
Hubbard, Tim J.
Guigó, Roderic
Reymond, Alexandre
author_facet Howald, Cédric
Tanzer, Andrea
Chrast, Jacqueline
Kokocinski, Felix
Derrien, Thomas
Walters, Nathalie
Gonzalez, Jose M.
Frankish, Adam
Aken, Bronwen L.
Hourlier, Thibaut
Vogel, Jan-Hinnerk
White, Simon
Searle, Stephen
Harrow, Jennifer
Hubbard, Tim J.
Guigó, Roderic
Reymond, Alexandre
author_sort Howald, Cédric
collection PubMed
description Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon–exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon–exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ∼11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.
format Online
Article
Text
id pubmed-3431487
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-34314872012-09-08 Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome Howald, Cédric Tanzer, Andrea Chrast, Jacqueline Kokocinski, Felix Derrien, Thomas Walters, Nathalie Gonzalez, Jose M. Frankish, Adam Aken, Bronwen L. Hourlier, Thibaut Vogel, Jan-Hinnerk White, Simon Searle, Stephen Harrow, Jennifer Hubbard, Tim J. Guigó, Roderic Reymond, Alexandre Genome Res Method Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon–exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon–exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ∼11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq. Cold Spring Harbor Laboratory Press 2012-09 /pmc/articles/PMC3431487/ /pubmed/22955982 http://dx.doi.org/10.1101/gr.134478.111 Text en © 2012, Published by Cold Spring Harbor Laboratory Press This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.
spellingShingle Method
Howald, Cédric
Tanzer, Andrea
Chrast, Jacqueline
Kokocinski, Felix
Derrien, Thomas
Walters, Nathalie
Gonzalez, Jose M.
Frankish, Adam
Aken, Bronwen L.
Hourlier, Thibaut
Vogel, Jan-Hinnerk
White, Simon
Searle, Stephen
Harrow, Jennifer
Hubbard, Tim J.
Guigó, Roderic
Reymond, Alexandre
Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome
title Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome
title_full Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome
title_fullStr Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome
title_full_unstemmed Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome
title_short Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome
title_sort combining rt-pcr-seq and rna-seq to catalog all genic elements encoded in the human genome
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431487/
https://www.ncbi.nlm.nih.gov/pubmed/22955982
http://dx.doi.org/10.1101/gr.134478.111
work_keys_str_mv AT howaldcedric combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT tanzerandrea combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT chrastjacqueline combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT kokocinskifelix combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT derrienthomas combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT waltersnathalie combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT gonzalezjosem combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT frankishadam combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT akenbronwenl combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT hourlierthibaut combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT vogeljanhinnerk combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT whitesimon combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT searlestephen combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT harrowjennifer combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT hubbardtimj combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT guigoroderic combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome
AT reymondalexandre combiningrtpcrseqandrnaseqtocatalogallgenicelementsencodedinthehumangenome