Cargando…

Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm

Great efforts are now underway to control the coronavirus 2019 disease (COVID-19). Millions of people are medically examined, and their data keep piling up awaiting classification. The data are typically both incomplete and heterogeneous which hampers classical classification algorithms. Some resear...

Descripción completa

Detalles Bibliográficos
Autores principales: Hamed, Ahmed, Sobhy, Ahmed, Nassar, Hamed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931985/
https://www.ncbi.nlm.nih.gov/pubmed/33688457
http://dx.doi.org/10.1007/s13369-020-05212-z
_version_ 1783660397787611136
author Hamed, Ahmed
Sobhy, Ahmed
Nassar, Hamed
author_facet Hamed, Ahmed
Sobhy, Ahmed
Nassar, Hamed
author_sort Hamed, Ahmed
collection PubMed
description Great efforts are now underway to control the coronavirus 2019 disease (COVID-19). Millions of people are medically examined, and their data keep piling up awaiting classification. The data are typically both incomplete and heterogeneous which hampers classical classification algorithms. Some researchers have recently modified the popular KNN algorithm as a solution, where they handle incompleteness by imputation and heterogeneity by converting categorical data into numbers. In this article, we introduce a novel KNN variant (KNNV) algorithm that provides better results as demonstrated by thorough experimental work. We employ rough set theoretic techniques to handle both incompleteness and heterogeneity, as well as to find an ideal value for K. The KNNV algorithm takes an incomplete, heterogeneous dataset, containing medical records of people, and identifies those cases with COVID-19. We use in the process two popular distance metrics, Euclidean and Mahalanobis, in an effort to widen the operational scope. The KNNV algorithm is implemented and tested on a real dataset from the Italian Society of Medical and Interventional Radiology. The experimental results show that it can efficiently and accurately classify COVID-19 cases. It is also compared to three KNN derivatives. The comparison results show that it greatly outperforms all its competitors in terms of four metrics: precision, recall, accuracy, and F-Score. The algorithm given in this article can be easily applied to classify other diseases. Moreover, its methodology can be further extended to do general classification tasks outside the medical field.
format Online
Article
Text
id pubmed-7931985
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-79319852021-03-05 Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm Hamed, Ahmed Sobhy, Ahmed Nassar, Hamed Arab J Sci Eng Research Article-Computer Engineering and Computer Science Great efforts are now underway to control the coronavirus 2019 disease (COVID-19). Millions of people are medically examined, and their data keep piling up awaiting classification. The data are typically both incomplete and heterogeneous which hampers classical classification algorithms. Some researchers have recently modified the popular KNN algorithm as a solution, where they handle incompleteness by imputation and heterogeneity by converting categorical data into numbers. In this article, we introduce a novel KNN variant (KNNV) algorithm that provides better results as demonstrated by thorough experimental work. We employ rough set theoretic techniques to handle both incompleteness and heterogeneity, as well as to find an ideal value for K. The KNNV algorithm takes an incomplete, heterogeneous dataset, containing medical records of people, and identifies those cases with COVID-19. We use in the process two popular distance metrics, Euclidean and Mahalanobis, in an effort to widen the operational scope. The KNNV algorithm is implemented and tested on a real dataset from the Italian Society of Medical and Interventional Radiology. The experimental results show that it can efficiently and accurately classify COVID-19 cases. It is also compared to three KNN derivatives. The comparison results show that it greatly outperforms all its competitors in terms of four metrics: precision, recall, accuracy, and F-Score. The algorithm given in this article can be easily applied to classify other diseases. Moreover, its methodology can be further extended to do general classification tasks outside the medical field. Springer Berlin Heidelberg 2021-03-04 2021 /pmc/articles/PMC7931985/ /pubmed/33688457 http://dx.doi.org/10.1007/s13369-020-05212-z Text en © King Fahd University of Petroleum & Minerals 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Research Article-Computer Engineering and Computer Science
Hamed, Ahmed
Sobhy, Ahmed
Nassar, Hamed
Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm
title Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm
title_full Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm
title_fullStr Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm
title_full_unstemmed Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm
title_short Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm
title_sort accurate classification of covid-19 based on incomplete heterogeneous data using a knn variant algorithm
topic Research Article-Computer Engineering and Computer Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931985/
https://www.ncbi.nlm.nih.gov/pubmed/33688457
http://dx.doi.org/10.1007/s13369-020-05212-z
work_keys_str_mv AT hamedahmed accurateclassificationofcovid19basedonincompleteheterogeneousdatausingaknnvariantalgorithm
AT sobhyahmed accurateclassificationofcovid19basedonincompleteheterogeneousdatausingaknnvariantalgorithm
AT nassarhamed accurateclassificationofcovid19basedonincompleteheterogeneousdatausingaknnvariantalgorithm