Cargando…

High-confidence coding and noncoding transcriptome maps

The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientati...

Descripción completa

Detalles Bibliográficos
Autores principales:	You, Bo-Hyun, Yoon, Sang-Ho, Nam, Jin-Wu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory Press 2017
Materias:	Method
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5453319/ https://www.ncbi.nlm.nih.gov/pubmed/28396519 http://dx.doi.org/10.1101/gr.214288.116

_version_	1783240633870188544
author	You, Bo-Hyun Yoon, Sang-Ho Nam, Jin-Wu
author_facet	You, Bo-Hyun Yoon, Sang-Ho Nam, Jin-Wu
author_sort	You, Bo-Hyun
collection	PubMed
description	The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes.
format	Online Article Text
id	pubmed-5453319
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Cold Spring Harbor Laboratory Press
record_format	MEDLINE/PubMed
spelling	pubmed-54533192017-06-15 High-confidence coding and noncoding transcriptome maps You, Bo-Hyun Yoon, Sang-Ho Nam, Jin-Wu Genome Res Method The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes. Cold Spring Harbor Laboratory Press 2017-06 /pmc/articles/PMC5453319/ /pubmed/28396519 http://dx.doi.org/10.1101/gr.214288.116 Text en © 2017 You et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle	Method You, Bo-Hyun Yoon, Sang-Ho Nam, Jin-Wu High-confidence coding and noncoding transcriptome maps
title	High-confidence coding and noncoding transcriptome maps
title_full	High-confidence coding and noncoding transcriptome maps
title_fullStr	High-confidence coding and noncoding transcriptome maps
title_full_unstemmed	High-confidence coding and noncoding transcriptome maps
title_short	High-confidence coding and noncoding transcriptome maps
title_sort	high-confidence coding and noncoding transcriptome maps
topic	Method
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5453319/ https://www.ncbi.nlm.nih.gov/pubmed/28396519 http://dx.doi.org/10.1101/gr.214288.116
work_keys_str_mv	AT youbohyun highconfidencecodingandnoncodingtranscriptomemaps AT yoonsangho highconfidencecodingandnoncodingtranscriptomemaps AT namjinwu highconfidencecodingandnoncodingtranscriptomemaps

High-confidence coding and noncoding transcriptome maps

Ejemplares similares