Cargando…

Highly efficient hypothesis testing methods for regression-type tests with correlated observations and heterogeneous variance structure

BACKGROUND: For many practical hypothesis testing (H-T) applications, the data are correlated and/or with heterogeneous variance structure. The regression t-test for weighted linear mixed-effects regression (LMER) is a legitimate choice because it accounts for complex covariance structure; however,...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yun, Bandyopadhyay, Gautam, Topham, David J., Falsey, Ann R., Qiu, Xing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6466736/
https://www.ncbi.nlm.nih.gov/pubmed/30987598
http://dx.doi.org/10.1186/s12859-019-2783-8
_version_ 1783411167124783104
author Zhang, Yun
Bandyopadhyay, Gautam
Topham, David J.
Falsey, Ann R.
Qiu, Xing
author_facet Zhang, Yun
Bandyopadhyay, Gautam
Topham, David J.
Falsey, Ann R.
Qiu, Xing
author_sort Zhang, Yun
collection PubMed
description BACKGROUND: For many practical hypothesis testing (H-T) applications, the data are correlated and/or with heterogeneous variance structure. The regression t-test for weighted linear mixed-effects regression (LMER) is a legitimate choice because it accounts for complex covariance structure; however, high computational costs and occasional convergence issues make it impractical for analyzing high-throughput data. In this paper, we propose computationally efficient parametric and semiparametric tests based on a set of specialized matrix techniques dubbed as the PB-transformation. The PB-transformation has two advantages: 1. The PB-transformed data will have a scalar variance-covariance matrix. 2. The original H-T problem will be reduced to an equivalent one-sample H-T problem. The transformed problem can then be approached by either the one-sample Student’s t-test or Wilcoxon signed rank test. RESULTS: In simulation studies, the proposed methods outperform commonly used alternative methods under both normal and double exponential distributions. In particular, the PB-transformed t-test produces notably better results than the weighted LMER test, especially in the high correlation case, using only a small fraction of computational cost (3 versus 933 s). We apply these two methods to a set of RNA-seq gene expression data collected in a breast cancer study. Pathway analyses show that the PB-transformed t-test reveals more biologically relevant findings in relation to breast cancer than the weighted LMER test. CONCLUSIONS: As fast and numerically stable replacements for the weighted LMER test, the PB-transformed tests are especially suitable for “messy” high-throughput data that include both independent and matched/repeated samples. By using our method, the practitioners no longer have to choose between using partial data (applying paired tests to only the matched samples) or ignoring the correlation in the data (applying two sample tests to data with some correlated samples). Our method is implemented as an R package ‘PBtest’ and is available at https://github.com/yunzhang813/PBtest-R-Package. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2783-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6466736
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64667362019-04-22 Highly efficient hypothesis testing methods for regression-type tests with correlated observations and heterogeneous variance structure Zhang, Yun Bandyopadhyay, Gautam Topham, David J. Falsey, Ann R. Qiu, Xing BMC Bioinformatics Methodology Article BACKGROUND: For many practical hypothesis testing (H-T) applications, the data are correlated and/or with heterogeneous variance structure. The regression t-test for weighted linear mixed-effects regression (LMER) is a legitimate choice because it accounts for complex covariance structure; however, high computational costs and occasional convergence issues make it impractical for analyzing high-throughput data. In this paper, we propose computationally efficient parametric and semiparametric tests based on a set of specialized matrix techniques dubbed as the PB-transformation. The PB-transformation has two advantages: 1. The PB-transformed data will have a scalar variance-covariance matrix. 2. The original H-T problem will be reduced to an equivalent one-sample H-T problem. The transformed problem can then be approached by either the one-sample Student’s t-test or Wilcoxon signed rank test. RESULTS: In simulation studies, the proposed methods outperform commonly used alternative methods under both normal and double exponential distributions. In particular, the PB-transformed t-test produces notably better results than the weighted LMER test, especially in the high correlation case, using only a small fraction of computational cost (3 versus 933 s). We apply these two methods to a set of RNA-seq gene expression data collected in a breast cancer study. Pathway analyses show that the PB-transformed t-test reveals more biologically relevant findings in relation to breast cancer than the weighted LMER test. CONCLUSIONS: As fast and numerically stable replacements for the weighted LMER test, the PB-transformed tests are especially suitable for “messy” high-throughput data that include both independent and matched/repeated samples. By using our method, the practitioners no longer have to choose between using partial data (applying paired tests to only the matched samples) or ignoring the correlation in the data (applying two sample tests to data with some correlated samples). Our method is implemented as an R package ‘PBtest’ and is available at https://github.com/yunzhang813/PBtest-R-Package. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2783-8) contains supplementary material, which is available to authorized users. BioMed Central 2019-04-15 /pmc/articles/PMC6466736/ /pubmed/30987598 http://dx.doi.org/10.1186/s12859-019-2783-8 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Zhang, Yun
Bandyopadhyay, Gautam
Topham, David J.
Falsey, Ann R.
Qiu, Xing
Highly efficient hypothesis testing methods for regression-type tests with correlated observations and heterogeneous variance structure
title Highly efficient hypothesis testing methods for regression-type tests with correlated observations and heterogeneous variance structure
title_full Highly efficient hypothesis testing methods for regression-type tests with correlated observations and heterogeneous variance structure
title_fullStr Highly efficient hypothesis testing methods for regression-type tests with correlated observations and heterogeneous variance structure
title_full_unstemmed Highly efficient hypothesis testing methods for regression-type tests with correlated observations and heterogeneous variance structure
title_short Highly efficient hypothesis testing methods for regression-type tests with correlated observations and heterogeneous variance structure
title_sort highly efficient hypothesis testing methods for regression-type tests with correlated observations and heterogeneous variance structure
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6466736/
https://www.ncbi.nlm.nih.gov/pubmed/30987598
http://dx.doi.org/10.1186/s12859-019-2783-8
work_keys_str_mv AT zhangyun highlyefficienthypothesistestingmethodsforregressiontypetestswithcorrelatedobservationsandheterogeneousvariancestructure
AT bandyopadhyaygautam highlyefficienthypothesistestingmethodsforregressiontypetestswithcorrelatedobservationsandheterogeneousvariancestructure
AT tophamdavidj highlyefficienthypothesistestingmethodsforregressiontypetestswithcorrelatedobservationsandheterogeneousvariancestructure
AT falseyannr highlyefficienthypothesistestingmethodsforregressiontypetestswithcorrelatedobservationsandheterogeneousvariancestructure
AT qiuxing highlyefficienthypothesistestingmethodsforregressiontypetestswithcorrelatedobservationsandheterogeneousvariancestructure