Cargando…

Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data

Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the...

Descripción completa

Detalles Bibliográficos
Autores principales: Newman, Jeremy R. B., Concannon, Patrick, Tardaguila, Manuel, Conesa, Ana, McIntyre, Lauren M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6118309/
https://www.ncbi.nlm.nih.gov/pubmed/30021829
http://dx.doi.org/10.1534/g3.118.200373
_version_ 1783351907602923520
author Newman, Jeremy R. B.
Concannon, Patrick
Tardaguila, Manuel
Conesa, Ana
McIntyre, Lauren M.
author_facet Newman, Jeremy R. B.
Concannon, Patrick
Tardaguila, Manuel
Conesa, Ana
McIntyre, Lauren M.
author_sort Newman, Jeremy R. B.
collection PubMed
description Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies.
format Online
Article
Text
id pubmed-6118309
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-61183092018-09-04 Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data Newman, Jeremy R. B. Concannon, Patrick Tardaguila, Manuel Conesa, Ana McIntyre, Lauren M. G3 (Bethesda) Investigations Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies. Genetics Society of America 2018-07-18 /pmc/articles/PMC6118309/ /pubmed/30021829 http://dx.doi.org/10.1534/g3.118.200373 Text en Copyright © 2018 Newman et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Newman, Jeremy R. B.
Concannon, Patrick
Tardaguila, Manuel
Conesa, Ana
McIntyre, Lauren M.
Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title_full Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title_fullStr Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title_full_unstemmed Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title_short Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
title_sort event analysis: using transcript events to improve estimates of abundance in rna-seq data
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6118309/
https://www.ncbi.nlm.nih.gov/pubmed/30021829
http://dx.doi.org/10.1534/g3.118.200373
work_keys_str_mv AT newmanjeremyrb eventanalysisusingtranscripteventstoimproveestimatesofabundanceinrnaseqdata
AT concannonpatrick eventanalysisusingtranscripteventstoimproveestimatesofabundanceinrnaseqdata
AT tardaguilamanuel eventanalysisusingtranscripteventstoimproveestimatesofabundanceinrnaseqdata
AT conesaana eventanalysisusingtranscripteventstoimproveestimatesofabundanceinrnaseqdata
AT mcintyrelaurenm eventanalysisusingtranscripteventstoimproveestimatesofabundanceinrnaseqdata