Cargando…

Two-point-based binary search trees for accelerating big data classification using KNN

Big data classification is very slow when using traditional machine learning classifiers, particularly when using a lazy and slow-by-nature classifier such as the k-nearest neighbors algorithm (KNN). This paper proposes a new approach which is based on sorting the feature vectors of training data in...

Descripción completa

Detalles Bibliográficos
Autor principal: Hassanat, Ahmad B. A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6257916/
https://www.ncbi.nlm.nih.gov/pubmed/30475862
http://dx.doi.org/10.1371/journal.pone.0207772
_version_ 1783374421606531072
author Hassanat, Ahmad B. A.
author_facet Hassanat, Ahmad B. A.
author_sort Hassanat, Ahmad B. A.
collection PubMed
description Big data classification is very slow when using traditional machine learning classifiers, particularly when using a lazy and slow-by-nature classifier such as the k-nearest neighbors algorithm (KNN). This paper proposes a new approach which is based on sorting the feature vectors of training data in a binary search tree to accelerate big data classification using the KNN approach. This is done using two methods, both of which utilize two local points to sort the examples based on their similarity to these local points. The first method chooses the local points based on their similarity to the global extreme points, while the second method chooses the local points randomly. The results of various experiments conducted on different big datasets show reasonable accuracy rates compared to state-of-the-art methods and the KNN classifier itself. More importantly, they show the high classification speed of both methods. This strong trait can be used to further improve the accuracy of the proposed methods.
format Online
Article
Text
id pubmed-6257916
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-62579162018-12-06 Two-point-based binary search trees for accelerating big data classification using KNN Hassanat, Ahmad B. A. PLoS One Research Article Big data classification is very slow when using traditional machine learning classifiers, particularly when using a lazy and slow-by-nature classifier such as the k-nearest neighbors algorithm (KNN). This paper proposes a new approach which is based on sorting the feature vectors of training data in a binary search tree to accelerate big data classification using the KNN approach. This is done using two methods, both of which utilize two local points to sort the examples based on their similarity to these local points. The first method chooses the local points based on their similarity to the global extreme points, while the second method chooses the local points randomly. The results of various experiments conducted on different big datasets show reasonable accuracy rates compared to state-of-the-art methods and the KNN classifier itself. More importantly, they show the high classification speed of both methods. This strong trait can be used to further improve the accuracy of the proposed methods. Public Library of Science 2018-11-26 /pmc/articles/PMC6257916/ /pubmed/30475862 http://dx.doi.org/10.1371/journal.pone.0207772 Text en © 2018 Ahmad B. A. Hassanat http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Hassanat, Ahmad B. A.
Two-point-based binary search trees for accelerating big data classification using KNN
title Two-point-based binary search trees for accelerating big data classification using KNN
title_full Two-point-based binary search trees for accelerating big data classification using KNN
title_fullStr Two-point-based binary search trees for accelerating big data classification using KNN
title_full_unstemmed Two-point-based binary search trees for accelerating big data classification using KNN
title_short Two-point-based binary search trees for accelerating big data classification using KNN
title_sort two-point-based binary search trees for accelerating big data classification using knn
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6257916/
https://www.ncbi.nlm.nih.gov/pubmed/30475862
http://dx.doi.org/10.1371/journal.pone.0207772
work_keys_str_mv AT hassanatahmadba twopointbasedbinarysearchtreesforacceleratingbigdataclassificationusingknn