Cargando…

K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data

K nearest neighbors (KNN) are known as one of the simplest nonparametric classifiers but in high dimensional setting accuracy of KNN are affected by nuisance features. In this study, we proposed the K important neighbors (KIN) as a novel approach for binary classification in high dimensional problem...

Descripción completa

Detalles Bibliográficos
Autores principales: Raeisi Shahraki, Hadi, Pourahmad, Saeedeh, Zare, Najaf
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5742505/
https://www.ncbi.nlm.nih.gov/pubmed/29376076
http://dx.doi.org/10.1155/2017/7560807
_version_ 1783288390689488896
author Raeisi Shahraki, Hadi
Pourahmad, Saeedeh
Zare, Najaf
author_facet Raeisi Shahraki, Hadi
Pourahmad, Saeedeh
Zare, Najaf
author_sort Raeisi Shahraki, Hadi
collection PubMed
description K nearest neighbors (KNN) are known as one of the simplest nonparametric classifiers but in high dimensional setting accuracy of KNN are affected by nuisance features. In this study, we proposed the K important neighbors (KIN) as a novel approach for binary classification in high dimensional problems. To avoid the curse of dimensionality, we implemented smoothly clipped absolute deviation (SCAD) logistic regression at the initial stage and considered the importance of each feature in construction of dissimilarity measure with imposing features contribution as a function of SCAD coefficients on Euclidean distance. The nature of this hybrid dissimilarity measure, which combines information of both features and distances, enjoys all good properties of SCAD penalized regression and KNN simultaneously. In comparison to KNN, simulation studies showed that KIN has a good performance in terms of both accuracy and dimension reduction. The proposed approach was found to be capable of eliminating nearly all of the noninformative features because of utilizing oracle property of SCAD penalized regression in the construction of dissimilarity measure. In very sparse settings, KIN also outperforms support vector machine (SVM) and random forest (RF) as the best classifiers.
format Online
Article
Text
id pubmed-5742505
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-57425052018-01-28 K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data Raeisi Shahraki, Hadi Pourahmad, Saeedeh Zare, Najaf Biomed Res Int Research Article K nearest neighbors (KNN) are known as one of the simplest nonparametric classifiers but in high dimensional setting accuracy of KNN are affected by nuisance features. In this study, we proposed the K important neighbors (KIN) as a novel approach for binary classification in high dimensional problems. To avoid the curse of dimensionality, we implemented smoothly clipped absolute deviation (SCAD) logistic regression at the initial stage and considered the importance of each feature in construction of dissimilarity measure with imposing features contribution as a function of SCAD coefficients on Euclidean distance. The nature of this hybrid dissimilarity measure, which combines information of both features and distances, enjoys all good properties of SCAD penalized regression and KNN simultaneously. In comparison to KNN, simulation studies showed that KIN has a good performance in terms of both accuracy and dimension reduction. The proposed approach was found to be capable of eliminating nearly all of the noninformative features because of utilizing oracle property of SCAD penalized regression in the construction of dissimilarity measure. In very sparse settings, KIN also outperforms support vector machine (SVM) and random forest (RF) as the best classifiers. Hindawi 2017 2017-12-11 /pmc/articles/PMC5742505/ /pubmed/29376076 http://dx.doi.org/10.1155/2017/7560807 Text en Copyright © 2017 Hadi Raeisi Shahraki et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Raeisi Shahraki, Hadi
Pourahmad, Saeedeh
Zare, Najaf
K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data
title K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data
title_full K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data
title_fullStr K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data
title_full_unstemmed K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data
title_short K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data
title_sort k important neighbors: a novel approach to binary classification in high dimensional data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5742505/
https://www.ncbi.nlm.nih.gov/pubmed/29376076
http://dx.doi.org/10.1155/2017/7560807
work_keys_str_mv AT raeisishahrakihadi kimportantneighborsanovelapproachtobinaryclassificationinhighdimensionaldata
AT pourahmadsaeedeh kimportantneighborsanovelapproachtobinaryclassificationinhighdimensionaldata
AT zarenajaf kimportantneighborsanovelapproachtobinaryclassificationinhighdimensionaldata