Cargando…

Multi-scale Fisher’s independence test for multivariate dependence

Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of...

Descripción completa

Detalles Bibliográficos
Autores principales: GORSKY, S., MA, L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9648765/
https://www.ncbi.nlm.nih.gov/pubmed/36381997
http://dx.doi.org/10.1093/biomet/asac013
_version_ 1784827648741998592
author GORSKY, S.
MA, L.
author_facet GORSKY, S.
MA, L.
author_sort GORSKY, S.
collection PubMed
description Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of massive sample sizes. Moreover, resampling is usually necessary to evaluate the statistical significance of the resulting test statistics at finite sample sizes, further worsening the computational burden. We introduce a scalable, resampling-free approach to testing the independence between two random vectors by breaking down the task into simple univariate tests of independence on a collection of 2 × 2 contingency tables constructed through sequential coarse-to-fine discretization of the sample space, transforming the inference task into a multiple testing problem that can be completed with almost linear complexity with respect to the sample size. To address increasing dimensionality, we introduce a coarse-to-fine sequential adaptive procedure that exploits the spatial features of dependency structures. We derive a finite-sample theory that guarantees the inferential validity of our adaptive procedure at any given sample size. We show that our approach can achieve strong control of the level of the testing procedure at any sample size without resampling or asymptotic approximation and establish its large-sample consistency. We demonstrate through an extensive simulation study its substantial computational advantage in comparison to existing approaches while achieving robust statistical power under various dependency scenarios, and illustrate how its divide-and-conquer nature can be exploited to not just test independence, but to learn the nature of the underlying dependency. Finally, we demonstrate the use of our method through analysing a dataset from a flow cytometry experiment.
format Online
Article
Text
id pubmed-9648765
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-96487652022-11-14 Multi-scale Fisher’s independence test for multivariate dependence GORSKY, S. MA, L. Biometrika Article Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of massive sample sizes. Moreover, resampling is usually necessary to evaluate the statistical significance of the resulting test statistics at finite sample sizes, further worsening the computational burden. We introduce a scalable, resampling-free approach to testing the independence between two random vectors by breaking down the task into simple univariate tests of independence on a collection of 2 × 2 contingency tables constructed through sequential coarse-to-fine discretization of the sample space, transforming the inference task into a multiple testing problem that can be completed with almost linear complexity with respect to the sample size. To address increasing dimensionality, we introduce a coarse-to-fine sequential adaptive procedure that exploits the spatial features of dependency structures. We derive a finite-sample theory that guarantees the inferential validity of our adaptive procedure at any given sample size. We show that our approach can achieve strong control of the level of the testing procedure at any sample size without resampling or asymptotic approximation and establish its large-sample consistency. We demonstrate through an extensive simulation study its substantial computational advantage in comparison to existing approaches while achieving robust statistical power under various dependency scenarios, and illustrate how its divide-and-conquer nature can be exploited to not just test independence, but to learn the nature of the underlying dependency. Finally, we demonstrate the use of our method through analysing a dataset from a flow cytometry experiment. 2022-09 2022-02-21 /pmc/articles/PMC9648765/ /pubmed/36381997 http://dx.doi.org/10.1093/biomet/asac013 Text en https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Article
GORSKY, S.
MA, L.
Multi-scale Fisher’s independence test for multivariate dependence
title Multi-scale Fisher’s independence test for multivariate dependence
title_full Multi-scale Fisher’s independence test for multivariate dependence
title_fullStr Multi-scale Fisher’s independence test for multivariate dependence
title_full_unstemmed Multi-scale Fisher’s independence test for multivariate dependence
title_short Multi-scale Fisher’s independence test for multivariate dependence
title_sort multi-scale fisher’s independence test for multivariate dependence
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9648765/
https://www.ncbi.nlm.nih.gov/pubmed/36381997
http://dx.doi.org/10.1093/biomet/asac013
work_keys_str_mv AT gorskys multiscalefishersindependencetestformultivariatedependence
AT mal multiscalefishersindependencetestformultivariatedependence