Cargando…
Super-delta: a new differential gene expression analysis procedure with robust data normalization
BACKGROUND: Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance redu...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5740711/ https://www.ncbi.nlm.nih.gov/pubmed/29268715 http://dx.doi.org/10.1186/s12859-017-1992-2 |
_version_ | 1783288074227154944 |
---|---|
author | Liu, Yuhang Zhang, Jinfeng Qiu, Xing |
author_facet | Liu, Yuhang Zhang, Jinfeng Qiu, Xing |
author_sort | Liu, Yuhang |
collection | PubMed |
description | BACKGROUND: Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically inflates type I error; and can reduce statistical power in certain situations. In this study we propose a new differential expression analysis pipeline, dubbed as super-delta, that consists of a multivariate extension of the global normalization and a modified t-test. A robust procedure is designed to minimize the bias introduced by DEGs in the normalization step. The modified t-test is derived based on asymptotic theory for hypothesis testing that suitably pairs with the proposed robust normalization. RESULTS: We first compared super-delta with four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization in simulation studies. Super-delta was shown to have better statistical power with tighter control of type I error rate than its competitors. In many cases, the performance of super-delta is close to that of an oracle test in which datasets without technical noise were used. We then applied all methods to a collection of gene expression datasets on breast cancer patients who received neoadjuvant chemotherapy. While there is a substantial overlap of the DEGs identified by all of them, super-delta were able to identify comparatively more DEGs than its competitors. Downstream gene set enrichment analysis confirmed that all these methods selected largely consistent pathways. Detailed investigations on the relatively small differences showed that pathways identified by super-delta have better connections to breast cancer than other methods. CONCLUSIONS: As a new pipeline, super-delta provides new insights to the area of differential gene expression analysis. Solid theoretical foundation supports its asymptotic unbiasedness and technical noise-free properties. Implementation on real and simulated datasets demonstrates its decent performance compared with state-of-art procedures. It also has the potential of expansion to be incorporated with other data type and/or more general between-group comparison problems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1992-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5740711 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57407112018-01-03 Super-delta: a new differential gene expression analysis procedure with robust data normalization Liu, Yuhang Zhang, Jinfeng Qiu, Xing BMC Bioinformatics Methodology Article BACKGROUND: Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically inflates type I error; and can reduce statistical power in certain situations. In this study we propose a new differential expression analysis pipeline, dubbed as super-delta, that consists of a multivariate extension of the global normalization and a modified t-test. A robust procedure is designed to minimize the bias introduced by DEGs in the normalization step. The modified t-test is derived based on asymptotic theory for hypothesis testing that suitably pairs with the proposed robust normalization. RESULTS: We first compared super-delta with four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization in simulation studies. Super-delta was shown to have better statistical power with tighter control of type I error rate than its competitors. In many cases, the performance of super-delta is close to that of an oracle test in which datasets without technical noise were used. We then applied all methods to a collection of gene expression datasets on breast cancer patients who received neoadjuvant chemotherapy. While there is a substantial overlap of the DEGs identified by all of them, super-delta were able to identify comparatively more DEGs than its competitors. Downstream gene set enrichment analysis confirmed that all these methods selected largely consistent pathways. Detailed investigations on the relatively small differences showed that pathways identified by super-delta have better connections to breast cancer than other methods. CONCLUSIONS: As a new pipeline, super-delta provides new insights to the area of differential gene expression analysis. Solid theoretical foundation supports its asymptotic unbiasedness and technical noise-free properties. Implementation on real and simulated datasets demonstrates its decent performance compared with state-of-art procedures. It also has the potential of expansion to be incorporated with other data type and/or more general between-group comparison problems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1992-2) contains supplementary material, which is available to authorized users. BioMed Central 2017-12-21 /pmc/articles/PMC5740711/ /pubmed/29268715 http://dx.doi.org/10.1186/s12859-017-1992-2 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Liu, Yuhang Zhang, Jinfeng Qiu, Xing Super-delta: a new differential gene expression analysis procedure with robust data normalization |
title | Super-delta: a new differential gene expression analysis procedure with robust data normalization |
title_full | Super-delta: a new differential gene expression analysis procedure with robust data normalization |
title_fullStr | Super-delta: a new differential gene expression analysis procedure with robust data normalization |
title_full_unstemmed | Super-delta: a new differential gene expression analysis procedure with robust data normalization |
title_short | Super-delta: a new differential gene expression analysis procedure with robust data normalization |
title_sort | super-delta: a new differential gene expression analysis procedure with robust data normalization |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5740711/ https://www.ncbi.nlm.nih.gov/pubmed/29268715 http://dx.doi.org/10.1186/s12859-017-1992-2 |
work_keys_str_mv | AT liuyuhang superdeltaanewdifferentialgeneexpressionanalysisprocedurewithrobustdatanormalization AT zhangjinfeng superdeltaanewdifferentialgeneexpressionanalysisprocedurewithrobustdatanormalization AT qiuxing superdeltaanewdifferentialgeneexpressionanalysisprocedurewithrobustdatanormalization |