Cargando…

Super-delta: a new differential gene expression analysis procedure with robust data normalization

BACKGROUND: Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance redu...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Yuhang, Zhang, Jinfeng, Qiu, Xing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5740711/
https://www.ncbi.nlm.nih.gov/pubmed/29268715
http://dx.doi.org/10.1186/s12859-017-1992-2
_version_ 1783288074227154944
author Liu, Yuhang
Zhang, Jinfeng
Qiu, Xing
author_facet Liu, Yuhang
Zhang, Jinfeng
Qiu, Xing
author_sort Liu, Yuhang
collection PubMed
description BACKGROUND: Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically inflates type I error; and can reduce statistical power in certain situations. In this study we propose a new differential expression analysis pipeline, dubbed as super-delta, that consists of a multivariate extension of the global normalization and a modified t-test. A robust procedure is designed to minimize the bias introduced by DEGs in the normalization step. The modified t-test is derived based on asymptotic theory for hypothesis testing that suitably pairs with the proposed robust normalization. RESULTS: We first compared super-delta with four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization in simulation studies. Super-delta was shown to have better statistical power with tighter control of type I error rate than its competitors. In many cases, the performance of super-delta is close to that of an oracle test in which datasets without technical noise were used. We then applied all methods to a collection of gene expression datasets on breast cancer patients who received neoadjuvant chemotherapy. While there is a substantial overlap of the DEGs identified by all of them, super-delta were able to identify comparatively more DEGs than its competitors. Downstream gene set enrichment analysis confirmed that all these methods selected largely consistent pathways. Detailed investigations on the relatively small differences showed that pathways identified by super-delta have better connections to breast cancer than other methods. CONCLUSIONS: As a new pipeline, super-delta provides new insights to the area of differential gene expression analysis. Solid theoretical foundation supports its asymptotic unbiasedness and technical noise-free properties. Implementation on real and simulated datasets demonstrates its decent performance compared with state-of-art procedures. It also has the potential of expansion to be incorporated with other data type and/or more general between-group comparison problems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1992-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5740711
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57407112018-01-03 Super-delta: a new differential gene expression analysis procedure with robust data normalization Liu, Yuhang Zhang, Jinfeng Qiu, Xing BMC Bioinformatics Methodology Article BACKGROUND: Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically inflates type I error; and can reduce statistical power in certain situations. In this study we propose a new differential expression analysis pipeline, dubbed as super-delta, that consists of a multivariate extension of the global normalization and a modified t-test. A robust procedure is designed to minimize the bias introduced by DEGs in the normalization step. The modified t-test is derived based on asymptotic theory for hypothesis testing that suitably pairs with the proposed robust normalization. RESULTS: We first compared super-delta with four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization in simulation studies. Super-delta was shown to have better statistical power with tighter control of type I error rate than its competitors. In many cases, the performance of super-delta is close to that of an oracle test in which datasets without technical noise were used. We then applied all methods to a collection of gene expression datasets on breast cancer patients who received neoadjuvant chemotherapy. While there is a substantial overlap of the DEGs identified by all of them, super-delta were able to identify comparatively more DEGs than its competitors. Downstream gene set enrichment analysis confirmed that all these methods selected largely consistent pathways. Detailed investigations on the relatively small differences showed that pathways identified by super-delta have better connections to breast cancer than other methods. CONCLUSIONS: As a new pipeline, super-delta provides new insights to the area of differential gene expression analysis. Solid theoretical foundation supports its asymptotic unbiasedness and technical noise-free properties. Implementation on real and simulated datasets demonstrates its decent performance compared with state-of-art procedures. It also has the potential of expansion to be incorporated with other data type and/or more general between-group comparison problems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1992-2) contains supplementary material, which is available to authorized users. BioMed Central 2017-12-21 /pmc/articles/PMC5740711/ /pubmed/29268715 http://dx.doi.org/10.1186/s12859-017-1992-2 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Liu, Yuhang
Zhang, Jinfeng
Qiu, Xing
Super-delta: a new differential gene expression analysis procedure with robust data normalization
title Super-delta: a new differential gene expression analysis procedure with robust data normalization
title_full Super-delta: a new differential gene expression analysis procedure with robust data normalization
title_fullStr Super-delta: a new differential gene expression analysis procedure with robust data normalization
title_full_unstemmed Super-delta: a new differential gene expression analysis procedure with robust data normalization
title_short Super-delta: a new differential gene expression analysis procedure with robust data normalization
title_sort super-delta: a new differential gene expression analysis procedure with robust data normalization
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5740711/
https://www.ncbi.nlm.nih.gov/pubmed/29268715
http://dx.doi.org/10.1186/s12859-017-1992-2
work_keys_str_mv AT liuyuhang superdeltaanewdifferentialgeneexpressionanalysisprocedurewithrobustdatanormalization
AT zhangjinfeng superdeltaanewdifferentialgeneexpressionanalysisprocedurewithrobustdatanormalization
AT qiuxing superdeltaanewdifferentialgeneexpressionanalysisprocedurewithrobustdatanormalization