Cargando…
Novel Data Transformations for RNA-seq Differential Expression Analysis
We propose eight data transformations (r, r2, rv, rv2, l, l2, lv, and lv2) for RNA-seq data analysis aiming to make the transformed sample mean to be representative of the distribution center since it is not always possible to transform count data to satisfy the normality assumption. Simulation stud...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6423143/ https://www.ncbi.nlm.nih.gov/pubmed/30886278 http://dx.doi.org/10.1038/s41598-019-41315-w |
_version_ | 1783404490787913728 |
---|---|
author | Zhang, Zeyu Yu, Danyang Seo, Minseok Hersh, Craig P. Weiss, Scott T. Qiu, Weiliang |
author_facet | Zhang, Zeyu Yu, Danyang Seo, Minseok Hersh, Craig P. Weiss, Scott T. Qiu, Weiliang |
author_sort | Zhang, Zeyu |
collection | PubMed |
description | We propose eight data transformations (r, r2, rv, rv2, l, l2, lv, and lv2) for RNA-seq data analysis aiming to make the transformed sample mean to be representative of the distribution center since it is not always possible to transform count data to satisfy the normality assumption. Simulation studies showed that for data sets with small (e.g., nCases = nControls = 3) or large sample size (e.g., nCases = nControls = 100) limma based on data from the l, l2, and r2 transformations performed better than limma based on data from the voom transformation in term of accuracy, FDR, and FNR. For datasets with moderate sample size (e.g., nCases = nControls = 30 or 50), limma with the rv and rv2 transformations performed similarly to limma with the voom transformation. Real data analysis results are consistent with simulation analysis results: limma with the r, l, r2, and l2 transformation performed better than limma with the voom transformation when sample sizes are small or large; limma with the rv and rv2 transformations performed similarly to limma with the voom transformation when sample sizes are moderate. We also observed from our data analyses that for datasets with large sample size, the gene-selection via the Wilcoxon rank sum test (a non-parametric two sample test method) based on the raw data outperformed limma based on the transformed data. |
format | Online Article Text |
id | pubmed-6423143 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-64231432019-03-26 Novel Data Transformations for RNA-seq Differential Expression Analysis Zhang, Zeyu Yu, Danyang Seo, Minseok Hersh, Craig P. Weiss, Scott T. Qiu, Weiliang Sci Rep Article We propose eight data transformations (r, r2, rv, rv2, l, l2, lv, and lv2) for RNA-seq data analysis aiming to make the transformed sample mean to be representative of the distribution center since it is not always possible to transform count data to satisfy the normality assumption. Simulation studies showed that for data sets with small (e.g., nCases = nControls = 3) or large sample size (e.g., nCases = nControls = 100) limma based on data from the l, l2, and r2 transformations performed better than limma based on data from the voom transformation in term of accuracy, FDR, and FNR. For datasets with moderate sample size (e.g., nCases = nControls = 30 or 50), limma with the rv and rv2 transformations performed similarly to limma with the voom transformation. Real data analysis results are consistent with simulation analysis results: limma with the r, l, r2, and l2 transformation performed better than limma with the voom transformation when sample sizes are small or large; limma with the rv and rv2 transformations performed similarly to limma with the voom transformation when sample sizes are moderate. We also observed from our data analyses that for datasets with large sample size, the gene-selection via the Wilcoxon rank sum test (a non-parametric two sample test method) based on the raw data outperformed limma based on the transformed data. Nature Publishing Group UK 2019-03-18 /pmc/articles/PMC6423143/ /pubmed/30886278 http://dx.doi.org/10.1038/s41598-019-41315-w Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Zhang, Zeyu Yu, Danyang Seo, Minseok Hersh, Craig P. Weiss, Scott T. Qiu, Weiliang Novel Data Transformations for RNA-seq Differential Expression Analysis |
title | Novel Data Transformations for RNA-seq Differential Expression Analysis |
title_full | Novel Data Transformations for RNA-seq Differential Expression Analysis |
title_fullStr | Novel Data Transformations for RNA-seq Differential Expression Analysis |
title_full_unstemmed | Novel Data Transformations for RNA-seq Differential Expression Analysis |
title_short | Novel Data Transformations for RNA-seq Differential Expression Analysis |
title_sort | novel data transformations for rna-seq differential expression analysis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6423143/ https://www.ncbi.nlm.nih.gov/pubmed/30886278 http://dx.doi.org/10.1038/s41598-019-41315-w |
work_keys_str_mv | AT zhangzeyu noveldatatransformationsforrnaseqdifferentialexpressionanalysis AT yudanyang noveldatatransformationsforrnaseqdifferentialexpressionanalysis AT seominseok noveldatatransformationsforrnaseqdifferentialexpressionanalysis AT hershcraigp noveldatatransformationsforrnaseqdifferentialexpressionanalysis AT weissscottt noveldatatransformationsforrnaseqdifferentialexpressionanalysis AT qiuweiliang noveldatatransformationsforrnaseqdifferentialexpressionanalysis |