Cargando…

Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data

Merging data from multiple samples is required to detect low expressed transcripts or splicing events that might be present only in a subset of samples. However, the exact number of required replicates enabling the detection of such rare events often remains a mystery but can be approached through p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kaisers, Wolfgang, Schwender, Holger, Schaal, Heiner
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2017
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5618549/ https://www.ncbi.nlm.nih.gov/pubmed/28872584 http://dx.doi.org/10.3390/ijms18091900

_version_	1783267212238258176
author	Kaisers, Wolfgang Schwender, Holger Schaal, Heiner
author_facet	Kaisers, Wolfgang Schwender, Holger Schaal, Heiner
author_sort	Kaisers, Wolfgang
collection	PubMed
description	Merging data from multiple samples is required to detect low expressed transcripts or splicing events that might be present only in a subset of samples. However, the exact number of required replicates enabling the detection of such rare events often remains a mystery but can be approached through probability theory. Here, we describe a probabilistic model, relating the number of observed events in a batch of samples with observation probabilities. Therein, samples appear as a heterogeneous collection of events, which are observed with some probability. The model is evaluated in a batch of 54 transcriptomes of human dermal fibroblast samples. The majority of putative splice-sites (alignment gap-sites) are detected in (almost) all samples or only sporadically, resulting in an U-shaped pattern for observation probabilities. The probabilistic model systematically underestimates event numbers due to a bias resulting from finite sampling. However, using an additional assumption, the probabilistic model can predict observed event numbers within a <10% deviation from the median. Single samples contain a considerable amount of uniquely observed putative splicing events (mean 7122 in alignments from TopHat alignments and 86,215 in alignments from STAR). We conclude that the probabilistic model provides an adequate description for observation of gap-sites in transcriptome data. Thus, the calculation of required sample sizes can be done by application of a simple binomial model to sporadically observed random events. Due to the large number of uniquely observed putative splice-sites and the known stochastic noise in the splicing machinery, it appears advisable to include observation of rare splicing events into analysis objectives. Therefore, it is beneficial to take scores for the validation of gap-sites into account.
format	Online Article Text
id	pubmed-5618549
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-56185492017-09-30 Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data Kaisers, Wolfgang Schwender, Holger Schaal, Heiner Int J Mol Sci Article Merging data from multiple samples is required to detect low expressed transcripts or splicing events that might be present only in a subset of samples. However, the exact number of required replicates enabling the detection of such rare events often remains a mystery but can be approached through probability theory. Here, we describe a probabilistic model, relating the number of observed events in a batch of samples with observation probabilities. Therein, samples appear as a heterogeneous collection of events, which are observed with some probability. The model is evaluated in a batch of 54 transcriptomes of human dermal fibroblast samples. The majority of putative splice-sites (alignment gap-sites) are detected in (almost) all samples or only sporadically, resulting in an U-shaped pattern for observation probabilities. The probabilistic model systematically underestimates event numbers due to a bias resulting from finite sampling. However, using an additional assumption, the probabilistic model can predict observed event numbers within a <10% deviation from the median. Single samples contain a considerable amount of uniquely observed putative splicing events (mean 7122 in alignments from TopHat alignments and 86,215 in alignments from STAR). We conclude that the probabilistic model provides an adequate description for observation of gap-sites in transcriptome data. Thus, the calculation of required sample sizes can be done by application of a simple binomial model to sporadically observed random events. Due to the large number of uniquely observed putative splice-sites and the known stochastic noise in the splicing machinery, it appears advisable to include observation of rare splicing events into analysis objectives. Therefore, it is beneficial to take scores for the validation of gap-sites into account. MDPI 2017-09-05 /pmc/articles/PMC5618549/ /pubmed/28872584 http://dx.doi.org/10.3390/ijms18091900 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kaisers, Wolfgang Schwender, Holger Schaal, Heiner Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data
title	Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data
title_full	Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data
title_fullStr	Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data
title_full_unstemmed	Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data
title_short	Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data
title_sort	sample size estimation for detection of splicing events in transcriptome sequencing data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5618549/ https://www.ncbi.nlm.nih.gov/pubmed/28872584 http://dx.doi.org/10.3390/ijms18091900
work_keys_str_mv	AT kaiserswolfgang samplesizeestimationfordetectionofsplicingeventsintranscriptomesequencingdata AT schwenderholger samplesizeestimationfordetectionofsplicingeventsintranscriptomesequencingdata AT schaalheiner samplesizeestimationfordetectionofsplicingeventsintranscriptomesequencingdata

Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data

Ejemplares similares