Cargando…

Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets

The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil’s case, the third most affected country by the pandemic. In this scenario, ma...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dorn, Marcio, Grisci, Bruno Iochins, Narloch, Pedro Henrique, Feltes, Bruno César, Avila, Eduardo, Kahmann, Alessandro, Alho, Clarice Sampaio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8372002/ https://www.ncbi.nlm.nih.gov/pubmed/34458574 http://dx.doi.org/10.7717/peerj-cs.670

_version_	1783739751283556352
author	Dorn, Marcio Grisci, Bruno Iochins Narloch, Pedro Henrique Feltes, Bruno César Avila, Eduardo Kahmann, Alessandro Alho, Clarice Sampaio
author_facet	Dorn, Marcio Grisci, Bruno Iochins Narloch, Pedro Henrique Feltes, Bruno César Avila, Eduardo Kahmann, Alessandro Alho, Clarice Sampaio
author_sort	Dorn, Marcio
collection	PubMed
description	The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil’s case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.
format	Online Article Text
id	pubmed-8372002
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-83720022021-08-26 Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets Dorn, Marcio Grisci, Bruno Iochins Narloch, Pedro Henrique Feltes, Bruno César Avila, Eduardo Kahmann, Alessandro Alho, Clarice Sampaio PeerJ Comput Sci Bioinformatics The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil’s case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms. PeerJ Inc. 2021-08-12 /pmc/articles/PMC8372002/ /pubmed/34458574 http://dx.doi.org/10.7717/peerj-cs.670 Text en © 2021 Dorn et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Dorn, Marcio Grisci, Bruno Iochins Narloch, Pedro Henrique Feltes, Bruno César Avila, Eduardo Kahmann, Alessandro Alho, Clarice Sampaio Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
title	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
title_full	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
title_fullStr	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
title_full_unstemmed	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
title_short	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
title_sort	comparison of machine learning techniques to handle imbalanced covid-19 cbc datasets
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8372002/ https://www.ncbi.nlm.nih.gov/pubmed/34458574 http://dx.doi.org/10.7717/peerj-cs.670
work_keys_str_mv	AT dornmarcio comparisonofmachinelearningtechniquestohandleimbalancedcovid19cbcdatasets AT griscibrunoiochins comparisonofmachinelearningtechniquestohandleimbalancedcovid19cbcdatasets AT narlochpedrohenrique comparisonofmachinelearningtechniquestohandleimbalancedcovid19cbcdatasets AT feltesbrunocesar comparisonofmachinelearningtechniquestohandleimbalancedcovid19cbcdatasets AT avilaeduardo comparisonofmachinelearningtechniquestohandleimbalancedcovid19cbcdatasets AT kahmannalessandro comparisonofmachinelearningtechniquestohandleimbalancedcovid19cbcdatasets AT alhoclaricesampaio comparisonofmachinelearningtechniquestohandleimbalancedcovid19cbcdatasets

Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets

Ejemplares similares