Cargando…
Multi-scale Fisher’s independence test for multivariate dependence
Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9648765/ https://www.ncbi.nlm.nih.gov/pubmed/36381997 http://dx.doi.org/10.1093/biomet/asac013 |
_version_ | 1784827648741998592 |
---|---|
author | GORSKY, S. MA, L. |
author_facet | GORSKY, S. MA, L. |
author_sort | GORSKY, S. |
collection | PubMed |
description | Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of massive sample sizes. Moreover, resampling is usually necessary to evaluate the statistical significance of the resulting test statistics at finite sample sizes, further worsening the computational burden. We introduce a scalable, resampling-free approach to testing the independence between two random vectors by breaking down the task into simple univariate tests of independence on a collection of 2 × 2 contingency tables constructed through sequential coarse-to-fine discretization of the sample space, transforming the inference task into a multiple testing problem that can be completed with almost linear complexity with respect to the sample size. To address increasing dimensionality, we introduce a coarse-to-fine sequential adaptive procedure that exploits the spatial features of dependency structures. We derive a finite-sample theory that guarantees the inferential validity of our adaptive procedure at any given sample size. We show that our approach can achieve strong control of the level of the testing procedure at any sample size without resampling or asymptotic approximation and establish its large-sample consistency. We demonstrate through an extensive simulation study its substantial computational advantage in comparison to existing approaches while achieving robust statistical power under various dependency scenarios, and illustrate how its divide-and-conquer nature can be exploited to not just test independence, but to learn the nature of the underlying dependency. Finally, we demonstrate the use of our method through analysing a dataset from a flow cytometry experiment. |
format | Online Article Text |
id | pubmed-9648765 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
record_format | MEDLINE/PubMed |
spelling | pubmed-96487652022-11-14 Multi-scale Fisher’s independence test for multivariate dependence GORSKY, S. MA, L. Biometrika Article Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of massive sample sizes. Moreover, resampling is usually necessary to evaluate the statistical significance of the resulting test statistics at finite sample sizes, further worsening the computational burden. We introduce a scalable, resampling-free approach to testing the independence between two random vectors by breaking down the task into simple univariate tests of independence on a collection of 2 × 2 contingency tables constructed through sequential coarse-to-fine discretization of the sample space, transforming the inference task into a multiple testing problem that can be completed with almost linear complexity with respect to the sample size. To address increasing dimensionality, we introduce a coarse-to-fine sequential adaptive procedure that exploits the spatial features of dependency structures. We derive a finite-sample theory that guarantees the inferential validity of our adaptive procedure at any given sample size. We show that our approach can achieve strong control of the level of the testing procedure at any sample size without resampling or asymptotic approximation and establish its large-sample consistency. We demonstrate through an extensive simulation study its substantial computational advantage in comparison to existing approaches while achieving robust statistical power under various dependency scenarios, and illustrate how its divide-and-conquer nature can be exploited to not just test independence, but to learn the nature of the underlying dependency. Finally, we demonstrate the use of our method through analysing a dataset from a flow cytometry experiment. 2022-09 2022-02-21 /pmc/articles/PMC9648765/ /pubmed/36381997 http://dx.doi.org/10.1093/biomet/asac013 Text en https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Article GORSKY, S. MA, L. Multi-scale Fisher’s independence test for multivariate dependence |
title | Multi-scale Fisher’s independence test for multivariate
dependence |
title_full | Multi-scale Fisher’s independence test for multivariate
dependence |
title_fullStr | Multi-scale Fisher’s independence test for multivariate
dependence |
title_full_unstemmed | Multi-scale Fisher’s independence test for multivariate
dependence |
title_short | Multi-scale Fisher’s independence test for multivariate
dependence |
title_sort | multi-scale fisher’s independence test for multivariate
dependence |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9648765/ https://www.ncbi.nlm.nih.gov/pubmed/36381997 http://dx.doi.org/10.1093/biomet/asac013 |
work_keys_str_mv | AT gorskys multiscalefishersindependencetestformultivariatedependence AT mal multiscalefishersindependencetestformultivariatedependence |