Cargando…

Improved disease diagnosis system for COVID-19 with data refactoring and handling methods

The novel coronavirus illness (COVID-19) outbreak, which began in a seafood market in Wuhan, Hubei Province, China, in mid-December 2019, has spread to almost all countries, territories, and places throughout the world. And since the fault in diagnosis of a disease causes a psychological impact, thi...

Descripción completa

Detalles Bibliográficos
Autores principales: Jha, Ritesh, Bhattacharjee, Vandana, Mustafi, Abhijit, Sahana, Sudip Kumar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9416861/
https://www.ncbi.nlm.nih.gov/pubmed/36033018
http://dx.doi.org/10.3389/fpsyg.2022.951027
_version_ 1784776570829799424
author Jha, Ritesh
Bhattacharjee, Vandana
Mustafi, Abhijit
Sahana, Sudip Kumar
author_facet Jha, Ritesh
Bhattacharjee, Vandana
Mustafi, Abhijit
Sahana, Sudip Kumar
author_sort Jha, Ritesh
collection PubMed
description The novel coronavirus illness (COVID-19) outbreak, which began in a seafood market in Wuhan, Hubei Province, China, in mid-December 2019, has spread to almost all countries, territories, and places throughout the world. And since the fault in diagnosis of a disease causes a psychological impact, this was very much visible in the spread of COVID-19. This research aims to address this issue by providing a better solution for diagnosis of the COVID-19 disease. The paper also addresses a very important issue of having less data for disease prediction models by elaborating on data handling techniques. Thus, special focus has been given on data processing and handling, with an aim to develop an improved machine learning model for diagnosis of COVID-19. Random Forest (RF), Decision tree (DT), K-Nearest Neighbor (KNN), Logistic Regression (LR), Support vector machine, and Deep Neural network (DNN) models are developed using the Hospital Israelita Albert Einstein (in São Paulo, Brazil) dataset to diagnose COVID-19. The dataset is pre-processed and distributed DT is applied to rank the features. Data augmentation has been applied to generate datasets for improving classification accuracy. The DNN model dominates overall techniques giving the highest accuracy of 96.99%, recall of 96.98%, and precision of 96.94%, which is better than or comparable to other research work. All the algorithms are implemented in a distributed environment on the Spark platform.
format Online
Article
Text
id pubmed-9416861
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-94168612022-08-27 Improved disease diagnosis system for COVID-19 with data refactoring and handling methods Jha, Ritesh Bhattacharjee, Vandana Mustafi, Abhijit Sahana, Sudip Kumar Front Psychol Psychology The novel coronavirus illness (COVID-19) outbreak, which began in a seafood market in Wuhan, Hubei Province, China, in mid-December 2019, has spread to almost all countries, territories, and places throughout the world. And since the fault in diagnosis of a disease causes a psychological impact, this was very much visible in the spread of COVID-19. This research aims to address this issue by providing a better solution for diagnosis of the COVID-19 disease. The paper also addresses a very important issue of having less data for disease prediction models by elaborating on data handling techniques. Thus, special focus has been given on data processing and handling, with an aim to develop an improved machine learning model for diagnosis of COVID-19. Random Forest (RF), Decision tree (DT), K-Nearest Neighbor (KNN), Logistic Regression (LR), Support vector machine, and Deep Neural network (DNN) models are developed using the Hospital Israelita Albert Einstein (in São Paulo, Brazil) dataset to diagnose COVID-19. The dataset is pre-processed and distributed DT is applied to rank the features. Data augmentation has been applied to generate datasets for improving classification accuracy. The DNN model dominates overall techniques giving the highest accuracy of 96.99%, recall of 96.98%, and precision of 96.94%, which is better than or comparable to other research work. All the algorithms are implemented in a distributed environment on the Spark platform. Frontiers Media S.A. 2022-08-12 /pmc/articles/PMC9416861/ /pubmed/36033018 http://dx.doi.org/10.3389/fpsyg.2022.951027 Text en Copyright © 2022 Jha, Bhattacharjee, Mustafi and Sahana. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Psychology
Jha, Ritesh
Bhattacharjee, Vandana
Mustafi, Abhijit
Sahana, Sudip Kumar
Improved disease diagnosis system for COVID-19 with data refactoring and handling methods
title Improved disease diagnosis system for COVID-19 with data refactoring and handling methods
title_full Improved disease diagnosis system for COVID-19 with data refactoring and handling methods
title_fullStr Improved disease diagnosis system for COVID-19 with data refactoring and handling methods
title_full_unstemmed Improved disease diagnosis system for COVID-19 with data refactoring and handling methods
title_short Improved disease diagnosis system for COVID-19 with data refactoring and handling methods
title_sort improved disease diagnosis system for covid-19 with data refactoring and handling methods
topic Psychology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9416861/
https://www.ncbi.nlm.nih.gov/pubmed/36033018
http://dx.doi.org/10.3389/fpsyg.2022.951027
work_keys_str_mv AT jharitesh improveddiseasediagnosissystemforcovid19withdatarefactoringandhandlingmethods
AT bhattacharjeevandana improveddiseasediagnosissystemforcovid19withdatarefactoringandhandlingmethods
AT mustafiabhijit improveddiseasediagnosissystemforcovid19withdatarefactoringandhandlingmethods
AT sahanasudipkumar improveddiseasediagnosissystemforcovid19withdatarefactoringandhandlingmethods