Cargando…

A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine

Recent studies have revealed the importance of the interaction effect in cardiac research. An analysis would lead to an erroneous conclusion when the approach failed to tackle a significant interaction. Regression models deal with interaction by adding the product of the two interactive variables. T...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Chao-Yu, Chang, Ke-Hao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8871671/
https://www.ncbi.nlm.nih.gov/pubmed/35206527
http://dx.doi.org/10.3390/ijerph19042338
_version_ 1784657051644854272
author Guo, Chao-Yu
Chang, Ke-Hao
author_facet Guo, Chao-Yu
Chang, Ke-Hao
author_sort Guo, Chao-Yu
collection PubMed
description Recent studies have revealed the importance of the interaction effect in cardiac research. An analysis would lead to an erroneous conclusion when the approach failed to tackle a significant interaction. Regression models deal with interaction by adding the product of the two interactive variables. Thus, statistical methods could evaluate the significance and contribution of the interaction term. However, machine learning strategies could not provide the p-value of specific feature interaction. Therefore, we propose a novel machine learning algorithm to assess the p-value of a feature interaction, named the extreme gradient boosting machine for feature interaction (XGB-FI). The first step incorporates the concept of statistical methodology by stratifying the original data into four subgroups according to the two interactive features. The second step builds four XGB machines with cross-validation techniques to avoid overfitting. The third step calculates a newly defined feature interaction ratio (FIR) for all possible combinations of predictors. Finally, we calculate the empirical p-value according to the FIR distribution. Computer simulation studies compared the XGB-FI with the multiple regression model with an interaction term. The results showed that the type I error of XGB-FI is valid under the nominal level of 0.05 when there is no interaction effect. The power of XGB-FI is consistently higher than the multiple regression model in all scenarios we examined. In conclusion, the new machine learning algorithm outperforms the conventional statistical model when searching for an interaction.
format Online
Article
Text
id pubmed-8871671
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-88716712022-02-25 A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine Guo, Chao-Yu Chang, Ke-Hao Int J Environ Res Public Health Article Recent studies have revealed the importance of the interaction effect in cardiac research. An analysis would lead to an erroneous conclusion when the approach failed to tackle a significant interaction. Regression models deal with interaction by adding the product of the two interactive variables. Thus, statistical methods could evaluate the significance and contribution of the interaction term. However, machine learning strategies could not provide the p-value of specific feature interaction. Therefore, we propose a novel machine learning algorithm to assess the p-value of a feature interaction, named the extreme gradient boosting machine for feature interaction (XGB-FI). The first step incorporates the concept of statistical methodology by stratifying the original data into four subgroups according to the two interactive features. The second step builds four XGB machines with cross-validation techniques to avoid overfitting. The third step calculates a newly defined feature interaction ratio (FIR) for all possible combinations of predictors. Finally, we calculate the empirical p-value according to the FIR distribution. Computer simulation studies compared the XGB-FI with the multiple regression model with an interaction term. The results showed that the type I error of XGB-FI is valid under the nominal level of 0.05 when there is no interaction effect. The power of XGB-FI is consistently higher than the multiple regression model in all scenarios we examined. In conclusion, the new machine learning algorithm outperforms the conventional statistical model when searching for an interaction. MDPI 2022-02-18 /pmc/articles/PMC8871671/ /pubmed/35206527 http://dx.doi.org/10.3390/ijerph19042338 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Guo, Chao-Yu
Chang, Ke-Hao
A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine
title A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine
title_full A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine
title_fullStr A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine
title_full_unstemmed A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine
title_short A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine
title_sort novel algorithm to estimate the significance level of a feature interaction using the extreme gradient boosting machine
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8871671/
https://www.ncbi.nlm.nih.gov/pubmed/35206527
http://dx.doi.org/10.3390/ijerph19042338
work_keys_str_mv AT guochaoyu anovelalgorithmtoestimatethesignificancelevelofafeatureinteractionusingtheextremegradientboostingmachine
AT changkehao anovelalgorithmtoestimatethesignificancelevelofafeatureinteractionusingtheextremegradientboostingmachine
AT guochaoyu novelalgorithmtoestimatethesignificancelevelofafeatureinteractionusingtheextremegradientboostingmachine
AT changkehao novelalgorithmtoestimatethesignificancelevelofafeatureinteractionusingtheextremegradientboostingmachine