Cargando…
A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count
BACKGROUND: As blood testing is radiation-free, low-cost and simple to operate, some researchers use machine learning to detect COVID-19 from blood test data. However, few studies take into consideration the imbalanced data distribution, which can impair the performance of a classifier. METHOD: A no...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier B.V.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479386/ https://www.ncbi.nlm.nih.gov/pubmed/34614451 http://dx.doi.org/10.1016/j.cmpb.2021.106444 |
_version_ | 1784576244091715584 |
---|---|
author | Wu, Jiachao Shen, Jiang Xu, Man Shao, Minglai |
author_facet | Wu, Jiachao Shen, Jiang Xu, Man Shao, Minglai |
author_sort | Wu, Jiachao |
collection | PubMed |
description | BACKGROUND: As blood testing is radiation-free, low-cost and simple to operate, some researchers use machine learning to detect COVID-19 from blood test data. However, few studies take into consideration the imbalanced data distribution, which can impair the performance of a classifier. METHOD: A novel combined dynamic ensemble selection (DES) method is proposed for imbalanced data to detect COVID-19 from complete blood count. This method combines data preprocessing and improved DES. Firstly, we use the hybrid synthetic minority over-sampling technique and edited nearest neighbor (SMOTE-ENN) to balance data and remove noise. Secondly, in order to improve the performance of DES, a novel hybrid multiple clustering and bagging classifier generation (HMCBCG) method is proposed to reinforce the diversity and local regional competence of candidate classifiers. RESULTS: The experimental results based on three popular DES methods show that the performance of HMCBCG is better than only use bagging. HMCBCG+KNE obtains the best performance for COVID-19 screening with 99.81% accuracy, 99.86% F1, 99.78% G-mean and 99.81% AUC. CONCLUSION: Compared to other advanced methods, our combined DES model can improve accuracy, G-mean, F1 and AUC of COVID-19 screening. |
format | Online Article Text |
id | pubmed-8479386 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier B.V. |
record_format | MEDLINE/PubMed |
spelling | pubmed-84793862021-09-29 A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count Wu, Jiachao Shen, Jiang Xu, Man Shao, Minglai Comput Methods Programs Biomed Article BACKGROUND: As blood testing is radiation-free, low-cost and simple to operate, some researchers use machine learning to detect COVID-19 from blood test data. However, few studies take into consideration the imbalanced data distribution, which can impair the performance of a classifier. METHOD: A novel combined dynamic ensemble selection (DES) method is proposed for imbalanced data to detect COVID-19 from complete blood count. This method combines data preprocessing and improved DES. Firstly, we use the hybrid synthetic minority over-sampling technique and edited nearest neighbor (SMOTE-ENN) to balance data and remove noise. Secondly, in order to improve the performance of DES, a novel hybrid multiple clustering and bagging classifier generation (HMCBCG) method is proposed to reinforce the diversity and local regional competence of candidate classifiers. RESULTS: The experimental results based on three popular DES methods show that the performance of HMCBCG is better than only use bagging. HMCBCG+KNE obtains the best performance for COVID-19 screening with 99.81% accuracy, 99.86% F1, 99.78% G-mean and 99.81% AUC. CONCLUSION: Compared to other advanced methods, our combined DES model can improve accuracy, G-mean, F1 and AUC of COVID-19 screening. Elsevier B.V. 2021-11 2021-09-29 /pmc/articles/PMC8479386/ /pubmed/34614451 http://dx.doi.org/10.1016/j.cmpb.2021.106444 Text en © 2021 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Wu, Jiachao Shen, Jiang Xu, Man Shao, Minglai A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count |
title | A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count |
title_full | A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count |
title_fullStr | A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count |
title_full_unstemmed | A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count |
title_short | A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count |
title_sort | novel combined dynamic ensemble selection model for imbalanced data to detect covid-19 from complete blood count |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479386/ https://www.ncbi.nlm.nih.gov/pubmed/34614451 http://dx.doi.org/10.1016/j.cmpb.2021.106444 |
work_keys_str_mv | AT wujiachao anovelcombineddynamicensembleselectionmodelforimbalanceddatatodetectcovid19fromcompletebloodcount AT shenjiang anovelcombineddynamicensembleselectionmodelforimbalanceddatatodetectcovid19fromcompletebloodcount AT xuman anovelcombineddynamicensembleselectionmodelforimbalanceddatatodetectcovid19fromcompletebloodcount AT shaominglai anovelcombineddynamicensembleselectionmodelforimbalanceddatatodetectcovid19fromcompletebloodcount AT wujiachao novelcombineddynamicensembleselectionmodelforimbalanceddatatodetectcovid19fromcompletebloodcount AT shenjiang novelcombineddynamicensembleselectionmodelforimbalanceddatatodetectcovid19fromcompletebloodcount AT xuman novelcombineddynamicensembleselectionmodelforimbalanceddatatodetectcovid19fromcompletebloodcount AT shaominglai novelcombineddynamicensembleselectionmodelforimbalanceddatatodetectcovid19fromcompletebloodcount |