Cargando…

The k conditional nearest neighbor algorithm for classification and class probability estimation

The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric c...

Descripción completa

Detalles Bibliográficos
Autores principales: Gweon, Hyukjun, Schonlau, Matthias, Steiner, Stefan H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924495/
https://www.ncbi.nlm.nih.gov/pubmed/33816847
http://dx.doi.org/10.7717/peerj-cs.194
_version_ 1783659102590730240
author Gweon, Hyukjun
Schonlau, Matthias
Steiner, Stefan H.
author_facet Gweon, Hyukjun
Schonlau, Matthias
Steiner, Stefan H.
author_sort Gweon, Hyukjun
collection PubMed
description The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric classification method based on nearest neighbors conditional on each class: the proposed approach calculates the distance between a new instance and the kth nearest neighbor from each class, estimates posterior probabilities of class memberships using the distances, and assigns the instance to the class with the largest posterior. We prove that the proposed approach converges to the Bayes classifier as the size of the training data increases. Further, we extend the proposed approach to an ensemble method. Experiments on benchmark data sets show that both the proposed approach and the ensemble version of the proposed approach on average outperform kNN, weighted kNN, probabilistic kNN and two similar algorithms (LMkNN and MLM-kHNN) in terms of the error rate. A simulation shows that kCNN may be useful for estimating posterior probabilities when the class distributions overlap.
format Online
Article
Text
id pubmed-7924495
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79244952021-04-02 The k conditional nearest neighbor algorithm for classification and class probability estimation Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H. PeerJ Comput Sci Data Mining and Machine Learning The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric classification method based on nearest neighbors conditional on each class: the proposed approach calculates the distance between a new instance and the kth nearest neighbor from each class, estimates posterior probabilities of class memberships using the distances, and assigns the instance to the class with the largest posterior. We prove that the proposed approach converges to the Bayes classifier as the size of the training data increases. Further, we extend the proposed approach to an ensemble method. Experiments on benchmark data sets show that both the proposed approach and the ensemble version of the proposed approach on average outperform kNN, weighted kNN, probabilistic kNN and two similar algorithms (LMkNN and MLM-kHNN) in terms of the error rate. A simulation shows that kCNN may be useful for estimating posterior probabilities when the class distributions overlap. PeerJ Inc. 2019-05-13 /pmc/articles/PMC7924495/ /pubmed/33816847 http://dx.doi.org/10.7717/peerj-cs.194 Text en ©2019 Gweon et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Mining and Machine Learning
Gweon, Hyukjun
Schonlau, Matthias
Steiner, Stefan H.
The k conditional nearest neighbor algorithm for classification and class probability estimation
title The k conditional nearest neighbor algorithm for classification and class probability estimation
title_full The k conditional nearest neighbor algorithm for classification and class probability estimation
title_fullStr The k conditional nearest neighbor algorithm for classification and class probability estimation
title_full_unstemmed The k conditional nearest neighbor algorithm for classification and class probability estimation
title_short The k conditional nearest neighbor algorithm for classification and class probability estimation
title_sort k conditional nearest neighbor algorithm for classification and class probability estimation
topic Data Mining and Machine Learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924495/
https://www.ncbi.nlm.nih.gov/pubmed/33816847
http://dx.doi.org/10.7717/peerj-cs.194
work_keys_str_mv AT gweonhyukjun thekconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation
AT schonlaumatthias thekconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation
AT steinerstefanh thekconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation
AT gweonhyukjun kconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation
AT schonlaumatthias kconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation
AT steinerstefanh kconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation