Cargando…

Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data

Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Newman, Jeremy R. B., Concannon, Patrick, Tardaguila, Manuel, Conesa, Ana, McIntyre, Lauren M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Genetics Society of America 2018
Materias:	Investigations
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6118309/ https://www.ncbi.nlm.nih.gov/pubmed/30021829 http://dx.doi.org/10.1534/g3.118.200373

_version_	1783351907602923520
author	Newman, Jeremy R. B. Concannon, Patrick Tardaguila, Manuel Conesa, Ana McIntyre, Lauren M.
author_facet	Newman, Jeremy R. B. Concannon, Patrick Tardaguila, Manuel Conesa, Ana McIntyre, Lauren M.
author_sort	Newman, Jeremy R. B.
collection	PubMed
description	Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies.
format	Online Article Text
id	pubmed-6118309
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Genetics Society of America
record_format	MEDLINE/PubMed
spelling	pubmed-61183092018-09-04 Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data Newman, Jeremy R. B. Concannon, Patrick Tardaguila, Manuel Conesa, Ana McIntyre, Lauren M. G3 (Bethesda) Investigations Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies. Genetics Society of America 2018-07-18 /pmc/articles/PMC6118309/ /pubmed/30021829 http://dx.doi.org/10.1534/g3.118.200373 Text en Copyright © 2018 Newman et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Investigations Newman, Jeremy R. B. Concannon, Patrick Tardaguila, Manuel Conesa, Ana McIntyre, Lauren M. Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title	Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title_full	Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title_fullStr	Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title_full_unstemmed	Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title_short	Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title_sort	event analysis: using transcript events to improve estimates of abundance in rna-seq data
topic	Investigations
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6118309/ https://www.ncbi.nlm.nih.gov/pubmed/30021829 http://dx.doi.org/10.1534/g3.118.200373
work_keys_str_mv	AT newmanjeremyrb eventanalysisusingtranscripteventstoimproveestimatesofabundanceinrnaseqdata AT concannonpatrick eventanalysisusingtranscripteventstoimproveestimatesofabundanceinrnaseqdata AT tardaguilamanuel eventanalysisusingtranscripteventstoimproveestimatesofabundanceinrnaseqdata AT conesaana eventanalysisusingtranscripteventstoimproveestimatesofabundanceinrnaseqdata AT mcintyrelaurenm eventanalysisusingtranscripteventstoimproveestimatesofabundanceinrnaseqdata

Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data

Ejemplares similares