Loading…

Multicollinearity and misleading statistical results

Multicollinearity represents a high degree of linear intercorrelation between explanatory variables in a multiple regression model and leads to incorrect results of regression analyses. Diagnostic tools of multicollinearity include the variance inflation factor (VIF), condition index and condition n...

Full description

Bibliographic Details
Main Author: Kim, Jong Hae
Format: Online Article Text
Language:English
Published: Korean Society of Anesthesiologists 2019
Subjects:
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6900425/
https://www.ncbi.nlm.nih.gov/pubmed/31304696
http://dx.doi.org/10.4097/kja.19087
_version_ 1783477356064669696
author Kim, Jong Hae
author_facet Kim, Jong Hae
author_sort Kim, Jong Hae
collection PubMed
description Multicollinearity represents a high degree of linear intercorrelation between explanatory variables in a multiple regression model and leads to incorrect results of regression analyses. Diagnostic tools of multicollinearity include the variance inflation factor (VIF), condition index and condition number, and variance decomposition proportion (VDP). The multicollinearity can be expressed by the coefficient of determination (R(h)(2)) of a multiple regression model with one explanatory variable (X(h)) as the model’s response variable and the others (X(i) [i≠h] as its explanatory variables. The variance (σ(h)(2)) of the regression coefficients constituting the final regression model are proportional to the VIF [Formula: see text]. Hence, an increase in R(h)(2) (strong multicollinearity) increases σ(h)(2). The larger σ(h)(2) produces unreliable probability values and confidence intervals of the regression coefficients. The square root of the ratio of the maximum eigenvalue to each eigenvalue from the correlation matrix of standardized explanatory variables is referred to as the condition index. The condition number is the maximum condition index. Multicollinearity is present when the VIF is higher than 5 to 10 or the condition indices are higher than 10 to 30. However, they cannot indicate multicollinear explanatory variables. VDPs obtained from the eigenvectors can identify the multicollinear variables by showing the extent of the inflation of σ(h)(2) according to each condition index. When two or more VDPs, which correspond to a common condition index higher than 10 to 30, are higher than 0.8 to 0.9, their associated explanatory variables are multicollinear. Excluding multicollinear explanatory variables leads to statistically stable multiple regression models.
format Online
Article
Text
id pubmed-6900425
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Korean Society of Anesthesiologists
record_format MEDLINE/PubMed
spelling pubmed-69004252019-12-12 Multicollinearity and misleading statistical results Kim, Jong Hae Korean J Anesthesiol Statistical Round Multicollinearity represents a high degree of linear intercorrelation between explanatory variables in a multiple regression model and leads to incorrect results of regression analyses. Diagnostic tools of multicollinearity include the variance inflation factor (VIF), condition index and condition number, and variance decomposition proportion (VDP). The multicollinearity can be expressed by the coefficient of determination (R(h)(2)) of a multiple regression model with one explanatory variable (X(h)) as the model’s response variable and the others (X(i) [i≠h] as its explanatory variables. The variance (σ(h)(2)) of the regression coefficients constituting the final regression model are proportional to the VIF [Formula: see text]. Hence, an increase in R(h)(2) (strong multicollinearity) increases σ(h)(2). The larger σ(h)(2) produces unreliable probability values and confidence intervals of the regression coefficients. The square root of the ratio of the maximum eigenvalue to each eigenvalue from the correlation matrix of standardized explanatory variables is referred to as the condition index. The condition number is the maximum condition index. Multicollinearity is present when the VIF is higher than 5 to 10 or the condition indices are higher than 10 to 30. However, they cannot indicate multicollinear explanatory variables. VDPs obtained from the eigenvectors can identify the multicollinear variables by showing the extent of the inflation of σ(h)(2) according to each condition index. When two or more VDPs, which correspond to a common condition index higher than 10 to 30, are higher than 0.8 to 0.9, their associated explanatory variables are multicollinear. Excluding multicollinear explanatory variables leads to statistically stable multiple regression models. Korean Society of Anesthesiologists 2019-12 2019-07-15 /pmc/articles/PMC6900425/ /pubmed/31304696 http://dx.doi.org/10.4097/kja.19087 Text en Copyright © The Korean Society of Anesthesiologists, 2019 This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Statistical Round
Kim, Jong Hae
Multicollinearity and misleading statistical results
title Multicollinearity and misleading statistical results
title_full Multicollinearity and misleading statistical results
title_fullStr Multicollinearity and misleading statistical results
title_full_unstemmed Multicollinearity and misleading statistical results
title_short Multicollinearity and misleading statistical results
title_sort multicollinearity and misleading statistical results
topic Statistical Round
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6900425/
https://www.ncbi.nlm.nih.gov/pubmed/31304696
http://dx.doi.org/10.4097/kja.19087
work_keys_str_mv AT kimjonghae multicollinearityandmisleadingstatisticalresults