Cargando…

Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms

One of the prominent uses of Predictive Analytics is Health care for more accurate predictions based on proper analysis of cumulative datasets. Often times the datasets are quite imbalanced and sampling techniques like Synthetic Minority Oversampling Technique (SMOTE) give only moderate accuracy in...

Descripción completa

Detalles Bibliográficos
Autores principales: Sowjanya, A. Mary, Mrudula, Owk
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8811587/
https://www.ncbi.nlm.nih.gov/pubmed/35132368
http://dx.doi.org/10.1007/s13204-021-02063-4
_version_ 1784644467852050432
author Sowjanya, A. Mary
Mrudula, Owk
author_facet Sowjanya, A. Mary
Mrudula, Owk
author_sort Sowjanya, A. Mary
collection PubMed
description One of the prominent uses of Predictive Analytics is Health care for more accurate predictions based on proper analysis of cumulative datasets. Often times the datasets are quite imbalanced and sampling techniques like Synthetic Minority Oversampling Technique (SMOTE) give only moderate accuracy in such cases. To overcome this problem, a two-step approach has been proposed. In the first step, SMOTE is modified to reduce the class imbalance in terms of Distance-based SMOTE (D-SMOTE) and Bi-phasic SMOTE (BP-SMOTE) which were then coupled with selective classifiers for prediction. An increase in accuracy is noted for both BP-SMOTE and D-SMOTE compared to basic SMOTE. In the second step, Machine learning, Deep Learning and Ensemble algorithms were used to develop a Stacking Ensemble Framework which showed a significant increase in accuracy for Stacking compared to individual machine learning algorithms like Decision Tree, Naïve Bayes, Neural Networks and Ensemble techniques like Voting, Bagging and Boosting. Two different methods have been developed by combing Deep learning with Stacking approach namely Stacked CNN and Stacked RNN which yielded significantly higher accuracy of 96–97% compared to individual algorithms. Framingham dataset is used for data sampling, Wisconsin Hospital data of Breast Cancer study is used for Stacked CNN and Novel Coronavirus 2019 dataset relating to forecasting COVID-19 cases, is used for Stacked RNN.
format Online
Article
Text
id pubmed-8811587
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-88115872022-02-03 Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms Sowjanya, A. Mary Mrudula, Owk Appl Nanosci Original Article One of the prominent uses of Predictive Analytics is Health care for more accurate predictions based on proper analysis of cumulative datasets. Often times the datasets are quite imbalanced and sampling techniques like Synthetic Minority Oversampling Technique (SMOTE) give only moderate accuracy in such cases. To overcome this problem, a two-step approach has been proposed. In the first step, SMOTE is modified to reduce the class imbalance in terms of Distance-based SMOTE (D-SMOTE) and Bi-phasic SMOTE (BP-SMOTE) which were then coupled with selective classifiers for prediction. An increase in accuracy is noted for both BP-SMOTE and D-SMOTE compared to basic SMOTE. In the second step, Machine learning, Deep Learning and Ensemble algorithms were used to develop a Stacking Ensemble Framework which showed a significant increase in accuracy for Stacking compared to individual machine learning algorithms like Decision Tree, Naïve Bayes, Neural Networks and Ensemble techniques like Voting, Bagging and Boosting. Two different methods have been developed by combing Deep learning with Stacking approach namely Stacked CNN and Stacked RNN which yielded significantly higher accuracy of 96–97% compared to individual algorithms. Framingham dataset is used for data sampling, Wisconsin Hospital data of Breast Cancer study is used for Stacked CNN and Novel Coronavirus 2019 dataset relating to forecasting COVID-19 cases, is used for Stacked RNN. Springer International Publishing 2022-02-03 2023 /pmc/articles/PMC8811587/ /pubmed/35132368 http://dx.doi.org/10.1007/s13204-021-02063-4 Text en © King Abdulaziz City for Science and Technology 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Article
Sowjanya, A. Mary
Mrudula, Owk
Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms
title Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms
title_full Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms
title_fullStr Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms
title_full_unstemmed Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms
title_short Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms
title_sort effective treatment of imbalanced datasets in health care using modified smote coupled with stacked deep learning algorithms
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8811587/
https://www.ncbi.nlm.nih.gov/pubmed/35132368
http://dx.doi.org/10.1007/s13204-021-02063-4
work_keys_str_mv AT sowjanyaamary effectivetreatmentofimbalanceddatasetsinhealthcareusingmodifiedsmotecoupledwithstackeddeeplearningalgorithms
AT mrudulaowk effectivetreatmentofimbalanceddatasetsinhealthcareusingmodifiedsmotecoupledwithstackeddeeplearningalgorithms