Cargando…
Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm
Great efforts are now underway to control the coronavirus 2019 disease (COVID-19). Millions of people are medically examined, and their data keep piling up awaiting classification. The data are typically both incomplete and heterogeneous which hampers classical classification algorithms. Some resear...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931985/ https://www.ncbi.nlm.nih.gov/pubmed/33688457 http://dx.doi.org/10.1007/s13369-020-05212-z |
_version_ | 1783660397787611136 |
---|---|
author | Hamed, Ahmed Sobhy, Ahmed Nassar, Hamed |
author_facet | Hamed, Ahmed Sobhy, Ahmed Nassar, Hamed |
author_sort | Hamed, Ahmed |
collection | PubMed |
description | Great efforts are now underway to control the coronavirus 2019 disease (COVID-19). Millions of people are medically examined, and their data keep piling up awaiting classification. The data are typically both incomplete and heterogeneous which hampers classical classification algorithms. Some researchers have recently modified the popular KNN algorithm as a solution, where they handle incompleteness by imputation and heterogeneity by converting categorical data into numbers. In this article, we introduce a novel KNN variant (KNNV) algorithm that provides better results as demonstrated by thorough experimental work. We employ rough set theoretic techniques to handle both incompleteness and heterogeneity, as well as to find an ideal value for K. The KNNV algorithm takes an incomplete, heterogeneous dataset, containing medical records of people, and identifies those cases with COVID-19. We use in the process two popular distance metrics, Euclidean and Mahalanobis, in an effort to widen the operational scope. The KNNV algorithm is implemented and tested on a real dataset from the Italian Society of Medical and Interventional Radiology. The experimental results show that it can efficiently and accurately classify COVID-19 cases. It is also compared to three KNN derivatives. The comparison results show that it greatly outperforms all its competitors in terms of four metrics: precision, recall, accuracy, and F-Score. The algorithm given in this article can be easily applied to classify other diseases. Moreover, its methodology can be further extended to do general classification tasks outside the medical field. |
format | Online Article Text |
id | pubmed-7931985 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-79319852021-03-05 Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm Hamed, Ahmed Sobhy, Ahmed Nassar, Hamed Arab J Sci Eng Research Article-Computer Engineering and Computer Science Great efforts are now underway to control the coronavirus 2019 disease (COVID-19). Millions of people are medically examined, and their data keep piling up awaiting classification. The data are typically both incomplete and heterogeneous which hampers classical classification algorithms. Some researchers have recently modified the popular KNN algorithm as a solution, where they handle incompleteness by imputation and heterogeneity by converting categorical data into numbers. In this article, we introduce a novel KNN variant (KNNV) algorithm that provides better results as demonstrated by thorough experimental work. We employ rough set theoretic techniques to handle both incompleteness and heterogeneity, as well as to find an ideal value for K. The KNNV algorithm takes an incomplete, heterogeneous dataset, containing medical records of people, and identifies those cases with COVID-19. We use in the process two popular distance metrics, Euclidean and Mahalanobis, in an effort to widen the operational scope. The KNNV algorithm is implemented and tested on a real dataset from the Italian Society of Medical and Interventional Radiology. The experimental results show that it can efficiently and accurately classify COVID-19 cases. It is also compared to three KNN derivatives. The comparison results show that it greatly outperforms all its competitors in terms of four metrics: precision, recall, accuracy, and F-Score. The algorithm given in this article can be easily applied to classify other diseases. Moreover, its methodology can be further extended to do general classification tasks outside the medical field. Springer Berlin Heidelberg 2021-03-04 2021 /pmc/articles/PMC7931985/ /pubmed/33688457 http://dx.doi.org/10.1007/s13369-020-05212-z Text en © King Fahd University of Petroleum & Minerals 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Research Article-Computer Engineering and Computer Science Hamed, Ahmed Sobhy, Ahmed Nassar, Hamed Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm |
title | Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm |
title_full | Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm |
title_fullStr | Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm |
title_full_unstemmed | Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm |
title_short | Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm |
title_sort | accurate classification of covid-19 based on incomplete heterogeneous data using a knn variant algorithm |
topic | Research Article-Computer Engineering and Computer Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931985/ https://www.ncbi.nlm.nih.gov/pubmed/33688457 http://dx.doi.org/10.1007/s13369-020-05212-z |
work_keys_str_mv | AT hamedahmed accurateclassificationofcovid19basedonincompleteheterogeneousdatausingaknnvariantalgorithm AT sobhyahmed accurateclassificationofcovid19basedonincompleteheterogeneousdatausingaknnvariantalgorithm AT nassarhamed accurateclassificationofcovid19basedonincompleteheterogeneousdatausingaknnvariantalgorithm |