Cargando…
K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data
K nearest neighbors (KNN) are known as one of the simplest nonparametric classifiers but in high dimensional setting accuracy of KNN are affected by nuisance features. In this study, we proposed the K important neighbors (KIN) as a novel approach for binary classification in high dimensional problem...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5742505/ https://www.ncbi.nlm.nih.gov/pubmed/29376076 http://dx.doi.org/10.1155/2017/7560807 |
_version_ | 1783288390689488896 |
---|---|
author | Raeisi Shahraki, Hadi Pourahmad, Saeedeh Zare, Najaf |
author_facet | Raeisi Shahraki, Hadi Pourahmad, Saeedeh Zare, Najaf |
author_sort | Raeisi Shahraki, Hadi |
collection | PubMed |
description | K nearest neighbors (KNN) are known as one of the simplest nonparametric classifiers but in high dimensional setting accuracy of KNN are affected by nuisance features. In this study, we proposed the K important neighbors (KIN) as a novel approach for binary classification in high dimensional problems. To avoid the curse of dimensionality, we implemented smoothly clipped absolute deviation (SCAD) logistic regression at the initial stage and considered the importance of each feature in construction of dissimilarity measure with imposing features contribution as a function of SCAD coefficients on Euclidean distance. The nature of this hybrid dissimilarity measure, which combines information of both features and distances, enjoys all good properties of SCAD penalized regression and KNN simultaneously. In comparison to KNN, simulation studies showed that KIN has a good performance in terms of both accuracy and dimension reduction. The proposed approach was found to be capable of eliminating nearly all of the noninformative features because of utilizing oracle property of SCAD penalized regression in the construction of dissimilarity measure. In very sparse settings, KIN also outperforms support vector machine (SVM) and random forest (RF) as the best classifiers. |
format | Online Article Text |
id | pubmed-5742505 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-57425052018-01-28 K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data Raeisi Shahraki, Hadi Pourahmad, Saeedeh Zare, Najaf Biomed Res Int Research Article K nearest neighbors (KNN) are known as one of the simplest nonparametric classifiers but in high dimensional setting accuracy of KNN are affected by nuisance features. In this study, we proposed the K important neighbors (KIN) as a novel approach for binary classification in high dimensional problems. To avoid the curse of dimensionality, we implemented smoothly clipped absolute deviation (SCAD) logistic regression at the initial stage and considered the importance of each feature in construction of dissimilarity measure with imposing features contribution as a function of SCAD coefficients on Euclidean distance. The nature of this hybrid dissimilarity measure, which combines information of both features and distances, enjoys all good properties of SCAD penalized regression and KNN simultaneously. In comparison to KNN, simulation studies showed that KIN has a good performance in terms of both accuracy and dimension reduction. The proposed approach was found to be capable of eliminating nearly all of the noninformative features because of utilizing oracle property of SCAD penalized regression in the construction of dissimilarity measure. In very sparse settings, KIN also outperforms support vector machine (SVM) and random forest (RF) as the best classifiers. Hindawi 2017 2017-12-11 /pmc/articles/PMC5742505/ /pubmed/29376076 http://dx.doi.org/10.1155/2017/7560807 Text en Copyright © 2017 Hadi Raeisi Shahraki et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Raeisi Shahraki, Hadi Pourahmad, Saeedeh Zare, Najaf K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data |
title |
K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data |
title_full |
K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data |
title_fullStr |
K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data |
title_full_unstemmed |
K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data |
title_short |
K Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data |
title_sort | k important neighbors: a novel approach to binary classification in high dimensional data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5742505/ https://www.ncbi.nlm.nih.gov/pubmed/29376076 http://dx.doi.org/10.1155/2017/7560807 |
work_keys_str_mv | AT raeisishahrakihadi kimportantneighborsanovelapproachtobinaryclassificationinhighdimensionaldata AT pourahmadsaeedeh kimportantneighborsanovelapproachtobinaryclassificationinhighdimensionaldata AT zarenajaf kimportantneighborsanovelapproachtobinaryclassificationinhighdimensionaldata |