Cargando…

Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance

[Image: see text] Feature importance (FI) is used to interpret the machine learning model y = f(x) constructed between the explanatory variables or features, x, and the objective variables, y. For a large number of features, interpreting the model in the order of increasing FI is inefficient when th...

Descripción completa

Detalles Bibliográficos
Autor principal:	Kaneko, Hiromasa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Chemical Society 2023
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10308517/ https://www.ncbi.nlm.nih.gov/pubmed/37396269 http://dx.doi.org/10.1021/acsomega.3c03722

_version_	1785066259769982976
author	Kaneko, Hiromasa
author_facet	Kaneko, Hiromasa
author_sort	Kaneko, Hiromasa
collection	PubMed
description	[Image: see text] Feature importance (FI) is used to interpret the machine learning model y = f(x) constructed between the explanatory variables or features, x, and the objective variables, y. For a large number of features, interpreting the model in the order of increasing FI is inefficient when there are similarly important features. Therefore, in this study, a method is developed to interpret models by considering the similarities between the features in addition to the FI. The cross-validated permutation feature importance (CVPFI), which can be calculated using any machine learning method and can handle multicollinearity problems, is used as the FI, while the absolute correlation and maximal information coefficients are used as metrics of feature similarity. Machine learning models could be effectively interpreted by considering the features from the Pareto fronts, where CVPFI is large and the feature similarity is small. Analyses of actual molecular and material data sets confirm that the proposed method enables the accurate interpretation of machine learning models.
format	Online Article Text
id	pubmed-10308517
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Chemical Society
record_format	MEDLINE/PubMed
spelling	pubmed-103085172023-06-30 Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance Kaneko, Hiromasa ACS Omega [Image: see text] Feature importance (FI) is used to interpret the machine learning model y = f(x) constructed between the explanatory variables or features, x, and the objective variables, y. For a large number of features, interpreting the model in the order of increasing FI is inefficient when there are similarly important features. Therefore, in this study, a method is developed to interpret models by considering the similarities between the features in addition to the FI. The cross-validated permutation feature importance (CVPFI), which can be calculated using any machine learning method and can handle multicollinearity problems, is used as the FI, while the absolute correlation and maximal information coefficients are used as metrics of feature similarity. Machine learning models could be effectively interpreted by considering the features from the Pareto fronts, where CVPFI is large and the feature similarity is small. Analyses of actual molecular and material data sets confirm that the proposed method enables the accurate interpretation of machine learning models. American Chemical Society 2023-06-14 /pmc/articles/PMC10308517/ /pubmed/37396269 http://dx.doi.org/10.1021/acsomega.3c03722 Text en © 2023 The Author. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Kaneko, Hiromasa Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance
title	Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance
title_full	Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance
title_fullStr	Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance
title_full_unstemmed	Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance
title_short	Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance
title_sort	interpretation of machine learning models for data sets with many features using feature importance
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10308517/ https://www.ncbi.nlm.nih.gov/pubmed/37396269 http://dx.doi.org/10.1021/acsomega.3c03722
work_keys_str_mv	AT kanekohiromasa interpretationofmachinelearningmodelsfordatasetswithmanyfeaturesusingfeatureimportance

Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance

Ejemplares similares