Cargando…

A Proximity Weighted Evidential k Nearest Neighbor Classifier for Imbalanced Data

In k Nearest Neighbor (kNN) classifier, a query instance is classified based on the most frequent class of its nearest neighbors among the training instances. In imbalanced datasets, kNN becomes biased towards the majority instances of the training space. To solve this problem, we propose a method c...

Descripción completa

Detalles Bibliográficos
Autores principales: Kadir, Md. Eusha, Akash, Pritom Saha, Sharmin, Sadia, Ali, Amin Ahsan, Shoyaib, Mohammad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206335/
http://dx.doi.org/10.1007/978-3-030-47436-2_6
_version_ 1783530395059355648
author Kadir, Md. Eusha
Akash, Pritom Saha
Sharmin, Sadia
Ali, Amin Ahsan
Shoyaib, Mohammad
author_facet Kadir, Md. Eusha
Akash, Pritom Saha
Sharmin, Sadia
Ali, Amin Ahsan
Shoyaib, Mohammad
author_sort Kadir, Md. Eusha
collection PubMed
description In k Nearest Neighbor (kNN) classifier, a query instance is classified based on the most frequent class of its nearest neighbors among the training instances. In imbalanced datasets, kNN becomes biased towards the majority instances of the training space. To solve this problem, we propose a method called Proximity weighted Evidential kNN classifier. In this method, each neighbor of a query instance is considered as a piece of evidence from which we calculate the probability of class label given feature values to provide more preference to the minority instances. This is then discounted by the proximity of the neighbor to prioritize the closer instances in the local neighborhood. These evidences are then combined using Dempster-Shafer theory of evidence. A rigorous experiment over 30 benchmark imbalanced datasets shows that our method performs better compared to 12 popular methods. In pairwise comparison of these 12 methods with our method, in the best case, our method wins in 29 datasets, and in the worst case it wins in least 19 datasets. More importantly, according to Friedman test the proposed method ranks higher than all other methods in terms of AUC at 5% level of significance.
format Online
Article
Text
id pubmed-7206335
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72063352020-05-08 A Proximity Weighted Evidential k Nearest Neighbor Classifier for Imbalanced Data Kadir, Md. Eusha Akash, Pritom Saha Sharmin, Sadia Ali, Amin Ahsan Shoyaib, Mohammad Advances in Knowledge Discovery and Data Mining Article In k Nearest Neighbor (kNN) classifier, a query instance is classified based on the most frequent class of its nearest neighbors among the training instances. In imbalanced datasets, kNN becomes biased towards the majority instances of the training space. To solve this problem, we propose a method called Proximity weighted Evidential kNN classifier. In this method, each neighbor of a query instance is considered as a piece of evidence from which we calculate the probability of class label given feature values to provide more preference to the minority instances. This is then discounted by the proximity of the neighbor to prioritize the closer instances in the local neighborhood. These evidences are then combined using Dempster-Shafer theory of evidence. A rigorous experiment over 30 benchmark imbalanced datasets shows that our method performs better compared to 12 popular methods. In pairwise comparison of these 12 methods with our method, in the best case, our method wins in 29 datasets, and in the worst case it wins in least 19 datasets. More importantly, according to Friedman test the proposed method ranks higher than all other methods in terms of AUC at 5% level of significance. 2020-04-17 /pmc/articles/PMC7206335/ http://dx.doi.org/10.1007/978-3-030-47436-2_6 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Kadir, Md. Eusha
Akash, Pritom Saha
Sharmin, Sadia
Ali, Amin Ahsan
Shoyaib, Mohammad
A Proximity Weighted Evidential k Nearest Neighbor Classifier for Imbalanced Data
title A Proximity Weighted Evidential k Nearest Neighbor Classifier for Imbalanced Data
title_full A Proximity Weighted Evidential k Nearest Neighbor Classifier for Imbalanced Data
title_fullStr A Proximity Weighted Evidential k Nearest Neighbor Classifier for Imbalanced Data
title_full_unstemmed A Proximity Weighted Evidential k Nearest Neighbor Classifier for Imbalanced Data
title_short A Proximity Weighted Evidential k Nearest Neighbor Classifier for Imbalanced Data
title_sort proximity weighted evidential k nearest neighbor classifier for imbalanced data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206335/
http://dx.doi.org/10.1007/978-3-030-47436-2_6
work_keys_str_mv AT kadirmdeusha aproximityweightedevidentialknearestneighborclassifierforimbalanceddata
AT akashpritomsaha aproximityweightedevidentialknearestneighborclassifierforimbalanceddata
AT sharminsadia aproximityweightedevidentialknearestneighborclassifierforimbalanceddata
AT aliaminahsan aproximityweightedevidentialknearestneighborclassifierforimbalanceddata
AT shoyaibmohammad aproximityweightedevidentialknearestneighborclassifierforimbalanceddata
AT kadirmdeusha proximityweightedevidentialknearestneighborclassifierforimbalanceddata
AT akashpritomsaha proximityweightedevidentialknearestneighborclassifierforimbalanceddata
AT sharminsadia proximityweightedevidentialknearestneighborclassifierforimbalanceddata
AT aliaminahsan proximityweightedevidentialknearestneighborclassifierforimbalanceddata
AT shoyaibmohammad proximityweightedevidentialknearestneighborclassifierforimbalanceddata