Cargando…

Novel Data Transformations for RNA-seq Differential Expression Analysis

We propose eight data transformations (r, r2, rv, rv2, l, l2, lv, and lv2) for RNA-seq data analysis aiming to make the transformed sample mean to be representative of the distribution center since it is not always possible to transform count data to satisfy the normality assumption. Simulation stud...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Zeyu, Yu, Danyang, Seo, Minseok, Hersh, Craig P., Weiss, Scott T., Qiu, Weiliang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6423143/
https://www.ncbi.nlm.nih.gov/pubmed/30886278
http://dx.doi.org/10.1038/s41598-019-41315-w
_version_ 1783404490787913728
author Zhang, Zeyu
Yu, Danyang
Seo, Minseok
Hersh, Craig P.
Weiss, Scott T.
Qiu, Weiliang
author_facet Zhang, Zeyu
Yu, Danyang
Seo, Minseok
Hersh, Craig P.
Weiss, Scott T.
Qiu, Weiliang
author_sort Zhang, Zeyu
collection PubMed
description We propose eight data transformations (r, r2, rv, rv2, l, l2, lv, and lv2) for RNA-seq data analysis aiming to make the transformed sample mean to be representative of the distribution center since it is not always possible to transform count data to satisfy the normality assumption. Simulation studies showed that for data sets with small (e.g., nCases = nControls = 3) or large sample size (e.g., nCases = nControls = 100) limma based on data from the l, l2, and r2 transformations performed better than limma based on data from the voom transformation in term of accuracy, FDR, and FNR. For datasets with moderate sample size (e.g., nCases = nControls = 30 or 50), limma with the rv and rv2 transformations performed similarly to limma with the voom transformation. Real data analysis results are consistent with simulation analysis results: limma with the r, l, r2, and l2 transformation performed better than limma with the voom transformation when sample sizes are small or large; limma with the rv and rv2 transformations performed similarly to limma with the voom transformation when sample sizes are moderate. We also observed from our data analyses that for datasets with large sample size, the gene-selection via the Wilcoxon rank sum test (a non-parametric two sample test method) based on the raw data outperformed limma based on the transformed data.
format Online
Article
Text
id pubmed-6423143
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-64231432019-03-26 Novel Data Transformations for RNA-seq Differential Expression Analysis Zhang, Zeyu Yu, Danyang Seo, Minseok Hersh, Craig P. Weiss, Scott T. Qiu, Weiliang Sci Rep Article We propose eight data transformations (r, r2, rv, rv2, l, l2, lv, and lv2) for RNA-seq data analysis aiming to make the transformed sample mean to be representative of the distribution center since it is not always possible to transform count data to satisfy the normality assumption. Simulation studies showed that for data sets with small (e.g., nCases = nControls = 3) or large sample size (e.g., nCases = nControls = 100) limma based on data from the l, l2, and r2 transformations performed better than limma based on data from the voom transformation in term of accuracy, FDR, and FNR. For datasets with moderate sample size (e.g., nCases = nControls = 30 or 50), limma with the rv and rv2 transformations performed similarly to limma with the voom transformation. Real data analysis results are consistent with simulation analysis results: limma with the r, l, r2, and l2 transformation performed better than limma with the voom transformation when sample sizes are small or large; limma with the rv and rv2 transformations performed similarly to limma with the voom transformation when sample sizes are moderate. We also observed from our data analyses that for datasets with large sample size, the gene-selection via the Wilcoxon rank sum test (a non-parametric two sample test method) based on the raw data outperformed limma based on the transformed data. Nature Publishing Group UK 2019-03-18 /pmc/articles/PMC6423143/ /pubmed/30886278 http://dx.doi.org/10.1038/s41598-019-41315-w Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Zhang, Zeyu
Yu, Danyang
Seo, Minseok
Hersh, Craig P.
Weiss, Scott T.
Qiu, Weiliang
Novel Data Transformations for RNA-seq Differential Expression Analysis
title Novel Data Transformations for RNA-seq Differential Expression Analysis
title_full Novel Data Transformations for RNA-seq Differential Expression Analysis
title_fullStr Novel Data Transformations for RNA-seq Differential Expression Analysis
title_full_unstemmed Novel Data Transformations for RNA-seq Differential Expression Analysis
title_short Novel Data Transformations for RNA-seq Differential Expression Analysis
title_sort novel data transformations for rna-seq differential expression analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6423143/
https://www.ncbi.nlm.nih.gov/pubmed/30886278
http://dx.doi.org/10.1038/s41598-019-41315-w
work_keys_str_mv AT zhangzeyu noveldatatransformationsforrnaseqdifferentialexpressionanalysis
AT yudanyang noveldatatransformationsforrnaseqdifferentialexpressionanalysis
AT seominseok noveldatatransformationsforrnaseqdifferentialexpressionanalysis
AT hershcraigp noveldatatransformationsforrnaseqdifferentialexpressionanalysis
AT weissscottt noveldatatransformationsforrnaseqdifferentialexpressionanalysis
AT qiuweiliang noveldatatransformationsforrnaseqdifferentialexpressionanalysis