Cargando…
Two-point-based binary search trees for accelerating big data classification using KNN
Big data classification is very slow when using traditional machine learning classifiers, particularly when using a lazy and slow-by-nature classifier such as the k-nearest neighbors algorithm (KNN). This paper proposes a new approach which is based on sorting the feature vectors of training data in...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6257916/ https://www.ncbi.nlm.nih.gov/pubmed/30475862 http://dx.doi.org/10.1371/journal.pone.0207772 |
_version_ | 1783374421606531072 |
---|---|
author | Hassanat, Ahmad B. A. |
author_facet | Hassanat, Ahmad B. A. |
author_sort | Hassanat, Ahmad B. A. |
collection | PubMed |
description | Big data classification is very slow when using traditional machine learning classifiers, particularly when using a lazy and slow-by-nature classifier such as the k-nearest neighbors algorithm (KNN). This paper proposes a new approach which is based on sorting the feature vectors of training data in a binary search tree to accelerate big data classification using the KNN approach. This is done using two methods, both of which utilize two local points to sort the examples based on their similarity to these local points. The first method chooses the local points based on their similarity to the global extreme points, while the second method chooses the local points randomly. The results of various experiments conducted on different big datasets show reasonable accuracy rates compared to state-of-the-art methods and the KNN classifier itself. More importantly, they show the high classification speed of both methods. This strong trait can be used to further improve the accuracy of the proposed methods. |
format | Online Article Text |
id | pubmed-6257916 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-62579162018-12-06 Two-point-based binary search trees for accelerating big data classification using KNN Hassanat, Ahmad B. A. PLoS One Research Article Big data classification is very slow when using traditional machine learning classifiers, particularly when using a lazy and slow-by-nature classifier such as the k-nearest neighbors algorithm (KNN). This paper proposes a new approach which is based on sorting the feature vectors of training data in a binary search tree to accelerate big data classification using the KNN approach. This is done using two methods, both of which utilize two local points to sort the examples based on their similarity to these local points. The first method chooses the local points based on their similarity to the global extreme points, while the second method chooses the local points randomly. The results of various experiments conducted on different big datasets show reasonable accuracy rates compared to state-of-the-art methods and the KNN classifier itself. More importantly, they show the high classification speed of both methods. This strong trait can be used to further improve the accuracy of the proposed methods. Public Library of Science 2018-11-26 /pmc/articles/PMC6257916/ /pubmed/30475862 http://dx.doi.org/10.1371/journal.pone.0207772 Text en © 2018 Ahmad B. A. Hassanat http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Hassanat, Ahmad B. A. Two-point-based binary search trees for accelerating big data classification using KNN |
title | Two-point-based binary search trees for accelerating big data classification using KNN |
title_full | Two-point-based binary search trees for accelerating big data classification using KNN |
title_fullStr | Two-point-based binary search trees for accelerating big data classification using KNN |
title_full_unstemmed | Two-point-based binary search trees for accelerating big data classification using KNN |
title_short | Two-point-based binary search trees for accelerating big data classification using KNN |
title_sort | two-point-based binary search trees for accelerating big data classification using knn |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6257916/ https://www.ncbi.nlm.nih.gov/pubmed/30475862 http://dx.doi.org/10.1371/journal.pone.0207772 |
work_keys_str_mv | AT hassanatahmadba twopointbasedbinarysearchtreesforacceleratingbigdataclassificationusingknn |