Cargando…

The k conditional nearest neighbor algorithm for classification and class probability estimation

The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gweon, Hyukjun, Schonlau, Matthias, Steiner, Stefan H.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2019
Materias:	Data Mining and Machine Learning
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924495/ https://www.ncbi.nlm.nih.gov/pubmed/33816847 http://dx.doi.org/10.7717/peerj-cs.194

_version_	1783659102590730240
author	Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H.
author_facet	Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H.
author_sort	Gweon, Hyukjun
collection	PubMed
description	The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric classification method based on nearest neighbors conditional on each class: the proposed approach calculates the distance between a new instance and the kth nearest neighbor from each class, estimates posterior probabilities of class memberships using the distances, and assigns the instance to the class with the largest posterior. We prove that the proposed approach converges to the Bayes classifier as the size of the training data increases. Further, we extend the proposed approach to an ensemble method. Experiments on benchmark data sets show that both the proposed approach and the ensemble version of the proposed approach on average outperform kNN, weighted kNN, probabilistic kNN and two similar algorithms (LMkNN and MLM-kHNN) in terms of the error rate. A simulation shows that kCNN may be useful for estimating posterior probabilities when the class distributions overlap.
format	Online Article Text
id	pubmed-7924495
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-79244952021-04-02 The k conditional nearest neighbor algorithm for classification and class probability estimation Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H. PeerJ Comput Sci Data Mining and Machine Learning The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric classification method based on nearest neighbors conditional on each class: the proposed approach calculates the distance between a new instance and the kth nearest neighbor from each class, estimates posterior probabilities of class memberships using the distances, and assigns the instance to the class with the largest posterior. We prove that the proposed approach converges to the Bayes classifier as the size of the training data increases. Further, we extend the proposed approach to an ensemble method. Experiments on benchmark data sets show that both the proposed approach and the ensemble version of the proposed approach on average outperform kNN, weighted kNN, probabilistic kNN and two similar algorithms (LMkNN and MLM-kHNN) in terms of the error rate. A simulation shows that kCNN may be useful for estimating posterior probabilities when the class distributions overlap. PeerJ Inc. 2019-05-13 /pmc/articles/PMC7924495/ /pubmed/33816847 http://dx.doi.org/10.7717/peerj-cs.194 Text en ©2019 Gweon et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Data Mining and Machine Learning Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H. The k conditional nearest neighbor algorithm for classification and class probability estimation
title	The k conditional nearest neighbor algorithm for classification and class probability estimation
title_full	The k conditional nearest neighbor algorithm for classification and class probability estimation
title_fullStr	The k conditional nearest neighbor algorithm for classification and class probability estimation
title_full_unstemmed	The k conditional nearest neighbor algorithm for classification and class probability estimation
title_short	The k conditional nearest neighbor algorithm for classification and class probability estimation
title_sort	k conditional nearest neighbor algorithm for classification and class probability estimation
topic	Data Mining and Machine Learning
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924495/ https://www.ncbi.nlm.nih.gov/pubmed/33816847 http://dx.doi.org/10.7717/peerj-cs.194
work_keys_str_mv	AT gweonhyukjun thekconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation AT schonlaumatthias thekconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation AT steinerstefanh thekconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation AT gweonhyukjun kconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation AT schonlaumatthias kconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation AT steinerstefanh kconditionalnearestneighboralgorithmforclassificationandclassprobabilityestimation

The k conditional nearest neighbor algorithm for classification and class probability estimation

Ejemplares similares