Cargando…

Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome

Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcr...

Descripción completa

Detalles Bibliográficos
Autores principales: Philippe, Nicolas, Bou Samra, Elias, Boureux, Anthony, Mancheron, Alban, Rufflé, Florence, Bai, Qiang, De Vos, John, Rivals, Eric, Commes, Thérèse
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3950697/
https://www.ncbi.nlm.nih.gov/pubmed/24357408
http://dx.doi.org/10.1093/nar/gkt1300
_version_ 1782307034542112768
author Philippe, Nicolas
Bou Samra, Elias
Boureux, Anthony
Mancheron, Alban
Rufflé, Florence
Bai, Qiang
De Vos, John
Rivals, Eric
Commes, Thérèse
author_facet Philippe, Nicolas
Bou Samra, Elias
Boureux, Anthony
Mancheron, Alban
Rufflé, Florence
Bai, Qiang
De Vos, John
Rivals, Eric
Commes, Thérèse
author_sort Philippe, Nicolas
collection PubMed
description Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.
format Online
Article
Text
id pubmed-3950697
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-39506972014-03-12 Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome Philippe, Nicolas Bou Samra, Elias Boureux, Anthony Mancheron, Alban Rufflé, Florence Bai, Qiang De Vos, John Rivals, Eric Commes, Thérèse Nucleic Acids Res Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct. Oxford University Press 2014-03 2013-12-18 /pmc/articles/PMC3950697/ /pubmed/24357408 http://dx.doi.org/10.1093/nar/gkt1300 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Philippe, Nicolas
Bou Samra, Elias
Boureux, Anthony
Mancheron, Alban
Rufflé, Florence
Bai, Qiang
De Vos, John
Rivals, Eric
Commes, Thérèse
Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome
title Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome
title_full Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome
title_fullStr Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome
title_full_unstemmed Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome
title_short Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome
title_sort combining dge and rna-sequencing data to identify new polya+ non-coding transcripts in the human genome
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3950697/
https://www.ncbi.nlm.nih.gov/pubmed/24357408
http://dx.doi.org/10.1093/nar/gkt1300
work_keys_str_mv AT philippenicolas combiningdgeandrnasequencingdatatoidentifynewpolyanoncodingtranscriptsinthehumangenome
AT bousamraelias combiningdgeandrnasequencingdatatoidentifynewpolyanoncodingtranscriptsinthehumangenome
AT boureuxanthony combiningdgeandrnasequencingdatatoidentifynewpolyanoncodingtranscriptsinthehumangenome
AT mancheronalban combiningdgeandrnasequencingdatatoidentifynewpolyanoncodingtranscriptsinthehumangenome
AT ruffleflorence combiningdgeandrnasequencingdatatoidentifynewpolyanoncodingtranscriptsinthehumangenome
AT baiqiang combiningdgeandrnasequencingdatatoidentifynewpolyanoncodingtranscriptsinthehumangenome
AT devosjohn combiningdgeandrnasequencingdatatoidentifynewpolyanoncodingtranscriptsinthehumangenome
AT rivalseric combiningdgeandrnasequencingdatatoidentifynewpolyanoncodingtranscriptsinthehumangenome
AT commestherese combiningdgeandrnasequencingdatatoidentifynewpolyanoncodingtranscriptsinthehumangenome