Cargando…

Knowledge-Based Reconstruction of mRNA Transcripts with Short Sequencing Reads for Transcriptome Research

While most transcriptome analyses in high-throughput clinical studies focus on gene level expression, the existence of alternative isoforms of gene transcripts is a major source of the diversity in the biological functionalities of the human genome. It is, therefore, essential to annotate isoforms o...

Descripción completa

Detalles Bibliográficos
Autores principales: Seok, Junhee, Xu, Weihong, Jiang, Hui, Davis, Ronald W., Xiao, Wenzhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3270033/
https://www.ncbi.nlm.nih.gov/pubmed/22312447
http://dx.doi.org/10.1371/journal.pone.0031440
_version_ 1782222538790666240
author Seok, Junhee
Xu, Weihong
Jiang, Hui
Davis, Ronald W.
Xiao, Wenzhong
author_facet Seok, Junhee
Xu, Weihong
Jiang, Hui
Davis, Ronald W.
Xiao, Wenzhong
author_sort Seok, Junhee
collection PubMed
description While most transcriptome analyses in high-throughput clinical studies focus on gene level expression, the existence of alternative isoforms of gene transcripts is a major source of the diversity in the biological functionalities of the human genome. It is, therefore, essential to annotate isoforms of gene transcripts for genome-wide transcriptome studies. Recently developed mRNA sequencing technology presents an unprecedented opportunity to discover new forms of transcripts, and at the same time brings bioinformatic challenges due to its short read length and incomplete coverage for the transcripts. In this work, we proposed a computational approach to reconstruct new mRNA transcripts from short sequencing reads with reference information of known transcripts in existing databases. The prior knowledge helped to define exon boundaries and fill in the transcript regions not covered by sequencing data. This approach was demonstrated using a deep sequencing data set of human muscle tissue with transcript annotations in RefSeq as prior knowledge. We identified 2,973 junctions, 7,471 exons, and 7,571 transcripts not previously annotated in RefSeq. 73% of these new transcripts found supports from UCSC Known Genes, Ensembl or EST transcript annotations. In addition, the reconstructed transcripts were much longer than those from de novo approaches that assume no prior knowledge. These previously un-annotated transcripts can be integrated with known transcript annotations to improve both the design of microarrays and the follow-up analyses of isoform expression. The overall results demonstrated that incorporating transcript annotations from genomic databases significantly helps the reconstruction of novel transcripts from short sequencing reads for transcriptome research.
format Online
Article
Text
id pubmed-3270033
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32700332012-02-06 Knowledge-Based Reconstruction of mRNA Transcripts with Short Sequencing Reads for Transcriptome Research Seok, Junhee Xu, Weihong Jiang, Hui Davis, Ronald W. Xiao, Wenzhong PLoS One Research Article While most transcriptome analyses in high-throughput clinical studies focus on gene level expression, the existence of alternative isoforms of gene transcripts is a major source of the diversity in the biological functionalities of the human genome. It is, therefore, essential to annotate isoforms of gene transcripts for genome-wide transcriptome studies. Recently developed mRNA sequencing technology presents an unprecedented opportunity to discover new forms of transcripts, and at the same time brings bioinformatic challenges due to its short read length and incomplete coverage for the transcripts. In this work, we proposed a computational approach to reconstruct new mRNA transcripts from short sequencing reads with reference information of known transcripts in existing databases. The prior knowledge helped to define exon boundaries and fill in the transcript regions not covered by sequencing data. This approach was demonstrated using a deep sequencing data set of human muscle tissue with transcript annotations in RefSeq as prior knowledge. We identified 2,973 junctions, 7,471 exons, and 7,571 transcripts not previously annotated in RefSeq. 73% of these new transcripts found supports from UCSC Known Genes, Ensembl or EST transcript annotations. In addition, the reconstructed transcripts were much longer than those from de novo approaches that assume no prior knowledge. These previously un-annotated transcripts can be integrated with known transcript annotations to improve both the design of microarrays and the follow-up analyses of isoform expression. The overall results demonstrated that incorporating transcript annotations from genomic databases significantly helps the reconstruction of novel transcripts from short sequencing reads for transcriptome research. Public Library of Science 2012-02-01 /pmc/articles/PMC3270033/ /pubmed/22312447 http://dx.doi.org/10.1371/journal.pone.0031440 Text en Seok et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Seok, Junhee
Xu, Weihong
Jiang, Hui
Davis, Ronald W.
Xiao, Wenzhong
Knowledge-Based Reconstruction of mRNA Transcripts with Short Sequencing Reads for Transcriptome Research
title Knowledge-Based Reconstruction of mRNA Transcripts with Short Sequencing Reads for Transcriptome Research
title_full Knowledge-Based Reconstruction of mRNA Transcripts with Short Sequencing Reads for Transcriptome Research
title_fullStr Knowledge-Based Reconstruction of mRNA Transcripts with Short Sequencing Reads for Transcriptome Research
title_full_unstemmed Knowledge-Based Reconstruction of mRNA Transcripts with Short Sequencing Reads for Transcriptome Research
title_short Knowledge-Based Reconstruction of mRNA Transcripts with Short Sequencing Reads for Transcriptome Research
title_sort knowledge-based reconstruction of mrna transcripts with short sequencing reads for transcriptome research
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3270033/
https://www.ncbi.nlm.nih.gov/pubmed/22312447
http://dx.doi.org/10.1371/journal.pone.0031440
work_keys_str_mv AT seokjunhee knowledgebasedreconstructionofmrnatranscriptswithshortsequencingreadsfortranscriptomeresearch
AT xuweihong knowledgebasedreconstructionofmrnatranscriptswithshortsequencingreadsfortranscriptomeresearch
AT jianghui knowledgebasedreconstructionofmrnatranscriptswithshortsequencingreadsfortranscriptomeresearch
AT davisronaldw knowledgebasedreconstructionofmrnatranscriptswithshortsequencingreadsfortranscriptomeresearch
AT xiaowenzhong knowledgebasedreconstructionofmrnatranscriptswithshortsequencingreadsfortranscriptomeresearch