Cargando…

Compressed kNN: K-Nearest Neighbors with Data Compression

The kNN (k-nearest neighbors) classification algorithm is one of the most widely used non-parametric classification methods, however it is limited due to memory consumption related to the size of the dataset, which makes them impractical to apply to large volumes of data. Variations of this method h...

Descripción completa

Detalles Bibliográficos
Autores principales:	Salvador–Meneses, Jaime, Ruiz–Chavez, Zoila, Garcia–Rodriguez, Jose
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514715/ https://www.ncbi.nlm.nih.gov/pubmed/33266949 http://dx.doi.org/10.3390/e21030234

_version_	1783586652104425472
author	Salvador–Meneses, Jaime Ruiz–Chavez, Zoila Garcia–Rodriguez, Jose
author_facet	Salvador–Meneses, Jaime Ruiz–Chavez, Zoila Garcia–Rodriguez, Jose
author_sort	Salvador–Meneses, Jaime
collection	PubMed
description	The kNN (k-nearest neighbors) classification algorithm is one of the most widely used non-parametric classification methods, however it is limited due to memory consumption related to the size of the dataset, which makes them impractical to apply to large volumes of data. Variations of this method have been proposed, such as condensed KNN which divides the training dataset into clusters to be classified, other variations reduce the input dataset in order to apply the algorithm. This paper presents a variation of the kNN algorithm, of the type structure less NN, to work with categorical data. Categorical data, due to their nature, can be compressed in order to decrease the memory requirements at the time of executing the classification. The method proposes a previous phase of compression of the data to then apply the algorithm on the compressed data. This allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required. Experiments and tests carried out on known datasets show the reduction in the volume of information stored in memory and maintain the accuracy of the classification. They also show a slight decrease in processing time because the information is decompressed in real time (on-the-fly) while the algorithm is running.
format	Online Article Text
id	pubmed-7514715
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75147152020-11-09 Compressed kNN: K-Nearest Neighbors with Data Compression Salvador–Meneses, Jaime Ruiz–Chavez, Zoila Garcia–Rodriguez, Jose Entropy (Basel) Article The kNN (k-nearest neighbors) classification algorithm is one of the most widely used non-parametric classification methods, however it is limited due to memory consumption related to the size of the dataset, which makes them impractical to apply to large volumes of data. Variations of this method have been proposed, such as condensed KNN which divides the training dataset into clusters to be classified, other variations reduce the input dataset in order to apply the algorithm. This paper presents a variation of the kNN algorithm, of the type structure less NN, to work with categorical data. Categorical data, due to their nature, can be compressed in order to decrease the memory requirements at the time of executing the classification. The method proposes a previous phase of compression of the data to then apply the algorithm on the compressed data. This allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required. Experiments and tests carried out on known datasets show the reduction in the volume of information stored in memory and maintain the accuracy of the classification. They also show a slight decrease in processing time because the information is decompressed in real time (on-the-fly) while the algorithm is running. MDPI 2019-02-28 /pmc/articles/PMC7514715/ /pubmed/33266949 http://dx.doi.org/10.3390/e21030234 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Salvador–Meneses, Jaime Ruiz–Chavez, Zoila Garcia–Rodriguez, Jose Compressed kNN: K-Nearest Neighbors with Data Compression
title	Compressed kNN: K-Nearest Neighbors with Data Compression
title_full	Compressed kNN: K-Nearest Neighbors with Data Compression
title_fullStr	Compressed kNN: K-Nearest Neighbors with Data Compression
title_full_unstemmed	Compressed kNN: K-Nearest Neighbors with Data Compression
title_short	Compressed kNN: K-Nearest Neighbors with Data Compression
title_sort	compressed knn: k-nearest neighbors with data compression
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514715/ https://www.ncbi.nlm.nih.gov/pubmed/33266949 http://dx.doi.org/10.3390/e21030234
work_keys_str_mv	AT salvadormenesesjaime compressedknnknearestneighborswithdatacompression AT ruizchavezzoila compressedknnknearestneighborswithdatacompression AT garciarodriguezjose compressedknnknearestneighborswithdatacompression

Compressed kNN: K-Nearest Neighbors with Data Compression

Ejemplares similares