Cargando…

Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse

This article presents a novel and remarkably efficient method of computing the statistical G-test made possible by exploiting a connection with the fundamental elements of information theory: by writing the G statistic as a sum of joint entropy terms, its computation is decomposed into easily reusab...

Descripción completa

Detalles Bibliográficos
Autores principales:	Băncioiu, Camil, Brad, Remus
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8619989/ https://www.ncbi.nlm.nih.gov/pubmed/34828198 http://dx.doi.org/10.3390/e23111501

_version_	1784605117026140160
author	Băncioiu, Camil Brad, Remus
author_facet	Băncioiu, Camil Brad, Remus
author_sort	Băncioiu, Camil
collection	PubMed
description	This article presents a novel and remarkably efficient method of computing the statistical G-test made possible by exploiting a connection with the fundamental elements of information theory: by writing the G statistic as a sum of joint entropy terms, its computation is decomposed into easily reusable partial results with no change in the resulting value. This method greatly improves the efficiency of applications that perform a series of G-tests on permutations of the same features, such as feature selection and causal inference applications because this decomposition allows for an intensive reuse of these partial results. The efficiency of this method is demonstrated by implementing it as part of an experiment involving IPC–MB, an efficient Markov blanket discovery algorithm, applicable both as a feature selection algorithm and as a causal inference method. The results show outstanding efficiency gains for IPC–MB when the G-test is computed with the proposed method, compared to the unoptimized G-test, but also when compared to IPC–MB++, a variant of IPC–MB which is enhanced with an AD–tree, both static and dynamic. Even if this proposed method of computing the G-test is presented here in the context of IPC–MB, it is in fact bound neither to IPC–MB in particular, nor to feature selection or causal inference applications in general, because this method targets the information-theoretic concept that underlies the G-test, namely conditional mutual information. This aspect grants it wide applicability in data sciences.
format	Online Article Text
id	pubmed-8619989
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-86199892021-11-27 Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse Băncioiu, Camil Brad, Remus Entropy (Basel) Article This article presents a novel and remarkably efficient method of computing the statistical G-test made possible by exploiting a connection with the fundamental elements of information theory: by writing the G statistic as a sum of joint entropy terms, its computation is decomposed into easily reusable partial results with no change in the resulting value. This method greatly improves the efficiency of applications that perform a series of G-tests on permutations of the same features, such as feature selection and causal inference applications because this decomposition allows for an intensive reuse of these partial results. The efficiency of this method is demonstrated by implementing it as part of an experiment involving IPC–MB, an efficient Markov blanket discovery algorithm, applicable both as a feature selection algorithm and as a causal inference method. The results show outstanding efficiency gains for IPC–MB when the G-test is computed with the proposed method, compared to the unoptimized G-test, but also when compared to IPC–MB++, a variant of IPC–MB which is enhanced with an AD–tree, both static and dynamic. Even if this proposed method of computing the G-test is presented here in the context of IPC–MB, it is in fact bound neither to IPC–MB in particular, nor to feature selection or causal inference applications in general, because this method targets the information-theoretic concept that underlies the G-test, namely conditional mutual information. This aspect grants it wide applicability in data sciences. MDPI 2021-11-12 /pmc/articles/PMC8619989/ /pubmed/34828198 http://dx.doi.org/10.3390/e23111501 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Băncioiu, Camil Brad, Remus Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
title	Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
title_full	Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
title_fullStr	Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
title_full_unstemmed	Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
title_short	Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
title_sort	accelerating causal inference and feature selection methods through g-test computation reuse
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8619989/ https://www.ncbi.nlm.nih.gov/pubmed/34828198 http://dx.doi.org/10.3390/e23111501
work_keys_str_mv	AT bancioiucamil acceleratingcausalinferenceandfeatureselectionmethodsthroughgtestcomputationreuse AT bradremus acceleratingcausalinferenceandfeatureselectionmethodsthroughgtestcomputationreuse

Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse

Ejemplares similares