Cargando…

Chi-MIC-share: a new feature selection algorithm for quantitative structure–activity relationship models

Quantitative structure–activity relationship models are used in toxicology to predict the effects of organic compounds on aquatic organisms. Common filter feature selection methods use correlation statistics to rank features, but this approach considers only the correlation between a single feature...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yuting, Dai, Zhijun, Cao, Dan, Luo, Feng, Chen, Yuan, Yuan, Zheming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society of Chemistry 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9054197/
https://www.ncbi.nlm.nih.gov/pubmed/35520405
http://dx.doi.org/10.1039/d0ra00061b
_version_ 1784697138776637440
author Li, Yuting
Dai, Zhijun
Cao, Dan
Luo, Feng
Chen, Yuan
Yuan, Zheming
author_facet Li, Yuting
Dai, Zhijun
Cao, Dan
Luo, Feng
Chen, Yuan
Yuan, Zheming
author_sort Li, Yuting
collection PubMed
description Quantitative structure–activity relationship models are used in toxicology to predict the effects of organic compounds on aquatic organisms. Common filter feature selection methods use correlation statistics to rank features, but this approach considers only the correlation between a single feature and the response variable and does not take into account feature redundancy. Although the minimal redundancy maximal relevance approach considers the redundancy among features, direct removal of the redundant features may result in loss of prediction accuracy, and cross-validation of training sets to select an optimal subset of features is time-consuming. In this paper, we describe the development of a feature selection method, Chi-MIC-share, which can terminate feature selection automatically and is based on an improved maximal information coefficient and a redundant allocation strategy. We validated Chi-MIC-share using three environmental toxicology datasets and a support vector regression model. The results show that Chi-MIC-share is more accurate than other feature selection methods. We also performed a significance test on the model and analyzed the single-factor effects of the reserved descriptors.
format Online
Article
Text
id pubmed-9054197
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher The Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-90541972022-05-04 Chi-MIC-share: a new feature selection algorithm for quantitative structure–activity relationship models Li, Yuting Dai, Zhijun Cao, Dan Luo, Feng Chen, Yuan Yuan, Zheming RSC Adv Chemistry Quantitative structure–activity relationship models are used in toxicology to predict the effects of organic compounds on aquatic organisms. Common filter feature selection methods use correlation statistics to rank features, but this approach considers only the correlation between a single feature and the response variable and does not take into account feature redundancy. Although the minimal redundancy maximal relevance approach considers the redundancy among features, direct removal of the redundant features may result in loss of prediction accuracy, and cross-validation of training sets to select an optimal subset of features is time-consuming. In this paper, we describe the development of a feature selection method, Chi-MIC-share, which can terminate feature selection automatically and is based on an improved maximal information coefficient and a redundant allocation strategy. We validated Chi-MIC-share using three environmental toxicology datasets and a support vector regression model. The results show that Chi-MIC-share is more accurate than other feature selection methods. We also performed a significance test on the model and analyzed the single-factor effects of the reserved descriptors. The Royal Society of Chemistry 2020-05-27 /pmc/articles/PMC9054197/ /pubmed/35520405 http://dx.doi.org/10.1039/d0ra00061b Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/
spellingShingle Chemistry
Li, Yuting
Dai, Zhijun
Cao, Dan
Luo, Feng
Chen, Yuan
Yuan, Zheming
Chi-MIC-share: a new feature selection algorithm for quantitative structure–activity relationship models
title Chi-MIC-share: a new feature selection algorithm for quantitative structure–activity relationship models
title_full Chi-MIC-share: a new feature selection algorithm for quantitative structure–activity relationship models
title_fullStr Chi-MIC-share: a new feature selection algorithm for quantitative structure–activity relationship models
title_full_unstemmed Chi-MIC-share: a new feature selection algorithm for quantitative structure–activity relationship models
title_short Chi-MIC-share: a new feature selection algorithm for quantitative structure–activity relationship models
title_sort chi-mic-share: a new feature selection algorithm for quantitative structure–activity relationship models
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9054197/
https://www.ncbi.nlm.nih.gov/pubmed/35520405
http://dx.doi.org/10.1039/d0ra00061b
work_keys_str_mv AT liyuting chimicshareanewfeatureselectionalgorithmforquantitativestructureactivityrelationshipmodels
AT daizhijun chimicshareanewfeatureselectionalgorithmforquantitativestructureactivityrelationshipmodels
AT caodan chimicshareanewfeatureselectionalgorithmforquantitativestructureactivityrelationshipmodels
AT luofeng chimicshareanewfeatureselectionalgorithmforquantitativestructureactivityrelationshipmodels
AT chenyuan chimicshareanewfeatureselectionalgorithmforquantitativestructureactivityrelationshipmodels
AT yuanzheming chimicshareanewfeatureselectionalgorithmforquantitativestructureactivityrelationshipmodels