Cargando…

Evaluating statistical analysis models for RNA sequencing experiments

Validating statistical analysis methods for RNA sequencing (RNA-seq) experiments is a complex task. Researchers often find themselves having to decide between competing models or assessing the reliability of results obtained with a designated analysis program. Computer simulation has been the most f...

Descripción completa

Detalles Bibliográficos
Autores principales: Reeb, Pablo D., Steibel, Juan P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3775431/
https://www.ncbi.nlm.nih.gov/pubmed/24062766
http://dx.doi.org/10.3389/fgene.2013.00178
_version_ 1782477379926491136
author Reeb, Pablo D.
Steibel, Juan P.
author_facet Reeb, Pablo D.
Steibel, Juan P.
author_sort Reeb, Pablo D.
collection PubMed
description Validating statistical analysis methods for RNA sequencing (RNA-seq) experiments is a complex task. Researchers often find themselves having to decide between competing models or assessing the reliability of results obtained with a designated analysis program. Computer simulation has been the most frequently used procedure to verify the adequacy of a model. However, datasets generated by simulations depend on the parameterization and the assumptions of the selected model. Moreover, such datasets may constitute a partial representation of reality as the complexity or RNA-seq data is hard to mimic. We present the use of plasmode datasets to complement the evaluation of statistical models for RNA-seq data. A plasmode is a dataset obtained from experimental data but for which come truth is known. Using a set of simulated scenarios of technical and biological replicates, and public available datasets, we illustrate how to design algorithms to construct plasmodes under different experimental conditions. We contrast results from two types of methods for RNA-seq: (1) models based on negative binomial distribution (edgeR and DESeq), and (2) Gaussian models applied after transformation of data (MAANOVA). Results emphasize the fact that deciding what method to use may be experiment-specific due to the unknown distributions of expression levels. Plasmodes may contribute to choose which method to apply by using a similar pre-existing dataset. The promising results obtained from this approach, emphasize the need of promoting and improving systematic data sharing across the research community to facilitate plasmode building. Although we illustrate the use of plasmode for comparing differential expression analysis models, the flexibility of plasmode construction allows comparing upstream analysis, as normalization procedures or alignment pipelines, as well.
format Online
Article
Text
id pubmed-3775431
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-37754312013-09-23 Evaluating statistical analysis models for RNA sequencing experiments Reeb, Pablo D. Steibel, Juan P. Front Genet Genetics Validating statistical analysis methods for RNA sequencing (RNA-seq) experiments is a complex task. Researchers often find themselves having to decide between competing models or assessing the reliability of results obtained with a designated analysis program. Computer simulation has been the most frequently used procedure to verify the adequacy of a model. However, datasets generated by simulations depend on the parameterization and the assumptions of the selected model. Moreover, such datasets may constitute a partial representation of reality as the complexity or RNA-seq data is hard to mimic. We present the use of plasmode datasets to complement the evaluation of statistical models for RNA-seq data. A plasmode is a dataset obtained from experimental data but for which come truth is known. Using a set of simulated scenarios of technical and biological replicates, and public available datasets, we illustrate how to design algorithms to construct plasmodes under different experimental conditions. We contrast results from two types of methods for RNA-seq: (1) models based on negative binomial distribution (edgeR and DESeq), and (2) Gaussian models applied after transformation of data (MAANOVA). Results emphasize the fact that deciding what method to use may be experiment-specific due to the unknown distributions of expression levels. Plasmodes may contribute to choose which method to apply by using a similar pre-existing dataset. The promising results obtained from this approach, emphasize the need of promoting and improving systematic data sharing across the research community to facilitate plasmode building. Although we illustrate the use of plasmode for comparing differential expression analysis models, the flexibility of plasmode construction allows comparing upstream analysis, as normalization procedures or alignment pipelines, as well. Frontiers Media S.A. 2013-09-17 /pmc/articles/PMC3775431/ /pubmed/24062766 http://dx.doi.org/10.3389/fgene.2013.00178 Text en Copyright © 2013 Reeb and Steibel. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Reeb, Pablo D.
Steibel, Juan P.
Evaluating statistical analysis models for RNA sequencing experiments
title Evaluating statistical analysis models for RNA sequencing experiments
title_full Evaluating statistical analysis models for RNA sequencing experiments
title_fullStr Evaluating statistical analysis models for RNA sequencing experiments
title_full_unstemmed Evaluating statistical analysis models for RNA sequencing experiments
title_short Evaluating statistical analysis models for RNA sequencing experiments
title_sort evaluating statistical analysis models for rna sequencing experiments
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3775431/
https://www.ncbi.nlm.nih.gov/pubmed/24062766
http://dx.doi.org/10.3389/fgene.2013.00178
work_keys_str_mv AT reebpablod evaluatingstatisticalanalysismodelsforrnasequencingexperiments
AT steibeljuanp evaluatingstatisticalanalysismodelsforrnasequencingexperiments