Cargando…
The k conditional nearest neighbor algorithm for classification and class probability estimation
The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric c...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924495/ https://www.ncbi.nlm.nih.gov/pubmed/33816847 http://dx.doi.org/10.7717/peerj-cs.194 |
_version_ | 1783659102590730240 |
---|---|
author | Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H. |
author_facet | Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H. |
author_sort | Gweon, Hyukjun |
collection | PubMed |
description | The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric classification method based on nearest neighbors conditional on each class: the proposed approach calculates the distance between a new instance and the kth nearest neighbor from each class, estimates posterior probabilities of class memberships using the distances, and assigns the instance to the class with the largest posterior. We prove that the proposed approach converges to the Bayes classifier as the size of the training data increases. Further, we extend the proposed approach to an ensemble method. Experiments on benchmark data sets show that both the proposed approach and the ensemble version of the proposed approach on average outperform kNN, weighted kNN, probabilistic kNN and two similar algorithms (LMkNN and MLM-kHNN) in terms of the error rate. A simulation shows that kCNN may be useful for estimating posterior probabilities when the class distributions overlap. |
format | Online Article Text |
id | pubmed-7924495 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-79244952021-04-02 The k conditional nearest neighbor algorithm for classification and class probability estimation Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H. PeerJ Comput Sci Data Mining and Machine Learning The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric classification method based on nearest neighbors conditional on each class: the proposed approach calculates the distance between a new instance and the kth nearest neighbor from each class, estimates posterior probabilities of class memberships using the distances, and assigns the instance to the class with the largest posterior. We prove that the proposed approach converges to the Bayes classifier as the size of the training data increases. Further, we extend the proposed approach to an ensemble method. Experiments on benchmark data sets show that both the proposed approach and the ensemble version of the proposed approach on average outperform kNN, weighted kNN, probabilistic kNN and two similar algorithms (LMkNN and MLM-kHNN) in terms of the error rate. A simulation shows that kCNN may be useful for estimating posterior probabilities when the class distributions overlap. PeerJ Inc. 2019-05-13 /pmc/articles/PMC7924495/ /pubmed/33816847 http://dx.doi.org/10.7717/peerj-cs.194 Text en ©2019 Gweon et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Data Mining and Machine Learning Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H. The k conditional nearest neighbor algorithm for classification and class probability estimation |
title | The k conditional nearest neighbor algorithm for classification and class probability estimation |
title_full | The k conditional nearest neighbor algorithm for classification and class probability estimation |
title_fullStr | The k conditional nearest neighbor algorithm for classification and class probability estimation |
title_full_unstemmed | The k conditional nearest neighbor algorithm for classification and class probability estimation |
title_short | The k conditional nearest neighbor algorithm for classification and class probability estimation |
title_sort | k conditional nearest neighbor algorithm for classification and class probability estimation |
topic | Data Mining and Machine Learning |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924495/ https://www.ncbi.nlm.nih.gov/pubmed/33816847 http://dx.doi.org/10.7717/peerj-cs.194 |
work_keys_str_mv | AT gweonhyukjun thekconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation AT schonlaumatthias thekconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation AT steinerstefanh thekconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation AT gweonhyukjun kconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation AT schonlaumatthias kconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation AT steinerstefanh kconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation |