Cargando…

Weighted Mean Squared Deviation Feature Screening for Binary Features

In this study, we propose a novel model-free feature screening method for ultrahigh dimensional binary features of binary classification, called weighted mean squared deviation (WMSD). Compared to Chi-square statistic and mutual information, WMSD provides more opportunities to the binary features wi...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Gaizhen, Guan, Guoyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516793/
https://www.ncbi.nlm.nih.gov/pubmed/33286109
http://dx.doi.org/10.3390/e22030335
_version_ 1783587083262099456
author Wang, Gaizhen
Guan, Guoyu
author_facet Wang, Gaizhen
Guan, Guoyu
author_sort Wang, Gaizhen
collection PubMed
description In this study, we propose a novel model-free feature screening method for ultrahigh dimensional binary features of binary classification, called weighted mean squared deviation (WMSD). Compared to Chi-square statistic and mutual information, WMSD provides more opportunities to the binary features with probabilities near 0.5. In addition, the asymptotic properties of the proposed method are theoretically investigated under the assumption [Formula: see text]. The number of features is practically selected by a Pearson correlation coefficient method according to the property of power-law distribution. Lastly, an empirical study of Chinese text classification illustrates that the proposed method performs well when the dimension of selected features is relatively small.
format Online
Article
Text
id pubmed-7516793
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75167932020-11-09 Weighted Mean Squared Deviation Feature Screening for Binary Features Wang, Gaizhen Guan, Guoyu Entropy (Basel) Article In this study, we propose a novel model-free feature screening method for ultrahigh dimensional binary features of binary classification, called weighted mean squared deviation (WMSD). Compared to Chi-square statistic and mutual information, WMSD provides more opportunities to the binary features with probabilities near 0.5. In addition, the asymptotic properties of the proposed method are theoretically investigated under the assumption [Formula: see text]. The number of features is practically selected by a Pearson correlation coefficient method according to the property of power-law distribution. Lastly, an empirical study of Chinese text classification illustrates that the proposed method performs well when the dimension of selected features is relatively small. MDPI 2020-03-14 /pmc/articles/PMC7516793/ /pubmed/33286109 http://dx.doi.org/10.3390/e22030335 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wang, Gaizhen
Guan, Guoyu
Weighted Mean Squared Deviation Feature Screening for Binary Features
title Weighted Mean Squared Deviation Feature Screening for Binary Features
title_full Weighted Mean Squared Deviation Feature Screening for Binary Features
title_fullStr Weighted Mean Squared Deviation Feature Screening for Binary Features
title_full_unstemmed Weighted Mean Squared Deviation Feature Screening for Binary Features
title_short Weighted Mean Squared Deviation Feature Screening for Binary Features
title_sort weighted mean squared deviation feature screening for binary features
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516793/
https://www.ncbi.nlm.nih.gov/pubmed/33286109
http://dx.doi.org/10.3390/e22030335
work_keys_str_mv AT wanggaizhen weightedmeansquareddeviationfeaturescreeningforbinaryfeatures
AT guanguoyu weightedmeansquareddeviationfeaturescreeningforbinaryfeatures