Cargando…

Robust Distance Measures for kNN Classification of Cancer Data

The k-Nearest Neighbor (kNN) classifier represents a simple and very general approach to classification. Still, the performance of kNN classifiers can often compete with more complex machine-learning algorithms. The core of kNN depends on a “guilt by association” principle where classification is pe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ehsani, Rezvan, Drabløs, Finn
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	SAGE Publications 2020
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7573750/ https://www.ncbi.nlm.nih.gov/pubmed/33116353 http://dx.doi.org/10.1177/1176935120965542

_version_	1783597510612221952
author	Ehsani, Rezvan Drabløs, Finn
author_facet	Ehsani, Rezvan Drabløs, Finn
author_sort	Ehsani, Rezvan
collection	PubMed
description	The k-Nearest Neighbor (kNN) classifier represents a simple and very general approach to classification. Still, the performance of kNN classifiers can often compete with more complex machine-learning algorithms. The core of kNN depends on a “guilt by association” principle where classification is performed by measuring the similarity between a query and a set of training patterns, often computed as distances. The relative performance of kNN classifiers is closely linked to the choice of distance or similarity measure, and it is therefore relevant to investigate the effect of using different distance measures when comparing biomedical data. In this study on classification of cancer data sets, we have used both common and novel distance measures, including the novel distance measures Sobolev and Fisher, and we have evaluated the performance of kNN with these distances on 4 cancer data sets of different type. We find that the performance when using the novel distance measures is comparable to the performance with more well-established measures, in particular for the Sobolev distance. We define a robust ranking of all the distance measures according to overall performance. Several distance measures show robust performance in kNN over several data sets, in particular the Hassanat, Sobolev, and Manhattan measures. Some of the other measures show good performance on selected data sets but seem to be more sensitive to the nature of the classification data. It is therefore important to benchmark distance measures on similar data prior to classification to identify the most suitable measure in each case.
format	Online Article Text
id	pubmed-7573750
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	SAGE Publications
record_format	MEDLINE/PubMed
spelling	pubmed-75737502020-10-27 Robust Distance Measures for kNN Classification of Cancer Data Ehsani, Rezvan Drabløs, Finn Cancer Inform Original Research The k-Nearest Neighbor (kNN) classifier represents a simple and very general approach to classification. Still, the performance of kNN classifiers can often compete with more complex machine-learning algorithms. The core of kNN depends on a “guilt by association” principle where classification is performed by measuring the similarity between a query and a set of training patterns, often computed as distances. The relative performance of kNN classifiers is closely linked to the choice of distance or similarity measure, and it is therefore relevant to investigate the effect of using different distance measures when comparing biomedical data. In this study on classification of cancer data sets, we have used both common and novel distance measures, including the novel distance measures Sobolev and Fisher, and we have evaluated the performance of kNN with these distances on 4 cancer data sets of different type. We find that the performance when using the novel distance measures is comparable to the performance with more well-established measures, in particular for the Sobolev distance. We define a robust ranking of all the distance measures according to overall performance. Several distance measures show robust performance in kNN over several data sets, in particular the Hassanat, Sobolev, and Manhattan measures. Some of the other measures show good performance on selected data sets but seem to be more sensitive to the nature of the classification data. It is therefore important to benchmark distance measures on similar data prior to classification to identify the most suitable measure in each case. SAGE Publications 2020-10-13 /pmc/articles/PMC7573750/ /pubmed/33116353 http://dx.doi.org/10.1177/1176935120965542 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by/4.0/ This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle	Original Research Ehsani, Rezvan Drabløs, Finn Robust Distance Measures for kNN Classification of Cancer Data
title	Robust Distance Measures for kNN Classification of Cancer Data
title_full	Robust Distance Measures for kNN Classification of Cancer Data
title_fullStr	Robust Distance Measures for kNN Classification of Cancer Data
title_full_unstemmed	Robust Distance Measures for kNN Classification of Cancer Data
title_short	Robust Distance Measures for kNN Classification of Cancer Data
title_sort	robust distance measures for knn classification of cancer data
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7573750/ https://www.ncbi.nlm.nih.gov/pubmed/33116353 http://dx.doi.org/10.1177/1176935120965542
work_keys_str_mv	AT ehsanirezvan robustdistancemeasuresforknnclassificationofcancerdata AT drabløsfinn robustdistancemeasuresforknnclassificationofcancerdata

Robust Distance Measures for kNN Classification of Cancer Data

Ejemplares similares