Cargando…

Individual Factors Associated With COVID-19 Infection: A Machine Learning Study

The fast, exponential increase of COVID-19 infections and their catastrophic effects on patients' health have required the development of tools that support health systems in the quick and efficient diagnosis and prognosis of this disease. In this context, the present study aims to identify the...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramírez-del Real, Tania, Martínez-García, Mireya, Márquez, Manlio F., López-Trejo, Laura, Gutiérrez-Esparza, Guadalupe, Hernández-Lemus, Enrique
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9279686/
https://www.ncbi.nlm.nih.gov/pubmed/35844896
http://dx.doi.org/10.3389/fpubh.2022.912099
_version_ 1784746453995880448
author Ramírez-del Real, Tania
Martínez-García, Mireya
Márquez, Manlio F.
López-Trejo, Laura
Gutiérrez-Esparza, Guadalupe
Hernández-Lemus, Enrique
author_facet Ramírez-del Real, Tania
Martínez-García, Mireya
Márquez, Manlio F.
López-Trejo, Laura
Gutiérrez-Esparza, Guadalupe
Hernández-Lemus, Enrique
author_sort Ramírez-del Real, Tania
collection PubMed
description The fast, exponential increase of COVID-19 infections and their catastrophic effects on patients' health have required the development of tools that support health systems in the quick and efficient diagnosis and prognosis of this disease. In this context, the present study aims to identify the potential factors associated with COVID-19 infections, applying machine learning techniques, particularly random forest, chi-squared, xgboost, and rpart for feature selection; ROSE and SMOTE were used as resampling methods due to the existence of class imbalance. Similarly, machine and deep learning algorithms such as support vector machines, C4.5, random forest, rpart, and deep neural networks were explored during the train/test phase to select the best prediction model. The dataset used in this study contains clinical data, anthropometric measurements, and other health parameters related to smoking habits, alcohol consumption, quality of sleep, physical activity, and health status during confinement due to the pandemic associated with COVID-19. The results showed that the XGBoost model got the best features associated with COVID-19 infection, and random forest approximated the best predictive model with a balanced accuracy of 90.41% using SMOTE as a resampling technique. The model with the best performance provides a tool to help prevent contracting SARS-CoV-2 since the variables with the highest risk factor are detected, and some of them are, to a certain extent controllable.
format Online
Article
Text
id pubmed-9279686
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-92796862022-07-15 Individual Factors Associated With COVID-19 Infection: A Machine Learning Study Ramírez-del Real, Tania Martínez-García, Mireya Márquez, Manlio F. López-Trejo, Laura Gutiérrez-Esparza, Guadalupe Hernández-Lemus, Enrique Front Public Health Public Health The fast, exponential increase of COVID-19 infections and their catastrophic effects on patients' health have required the development of tools that support health systems in the quick and efficient diagnosis and prognosis of this disease. In this context, the present study aims to identify the potential factors associated with COVID-19 infections, applying machine learning techniques, particularly random forest, chi-squared, xgboost, and rpart for feature selection; ROSE and SMOTE were used as resampling methods due to the existence of class imbalance. Similarly, machine and deep learning algorithms such as support vector machines, C4.5, random forest, rpart, and deep neural networks were explored during the train/test phase to select the best prediction model. The dataset used in this study contains clinical data, anthropometric measurements, and other health parameters related to smoking habits, alcohol consumption, quality of sleep, physical activity, and health status during confinement due to the pandemic associated with COVID-19. The results showed that the XGBoost model got the best features associated with COVID-19 infection, and random forest approximated the best predictive model with a balanced accuracy of 90.41% using SMOTE as a resampling technique. The model with the best performance provides a tool to help prevent contracting SARS-CoV-2 since the variables with the highest risk factor are detected, and some of them are, to a certain extent controllable. Frontiers Media S.A. 2022-06-30 /pmc/articles/PMC9279686/ /pubmed/35844896 http://dx.doi.org/10.3389/fpubh.2022.912099 Text en Copyright © 2022 Ramírez-del Real, Martínez-García, Márquez, López-Trejo, Gutiérrez-Esparza and Hernández-Lemus. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Public Health
Ramírez-del Real, Tania
Martínez-García, Mireya
Márquez, Manlio F.
López-Trejo, Laura
Gutiérrez-Esparza, Guadalupe
Hernández-Lemus, Enrique
Individual Factors Associated With COVID-19 Infection: A Machine Learning Study
title Individual Factors Associated With COVID-19 Infection: A Machine Learning Study
title_full Individual Factors Associated With COVID-19 Infection: A Machine Learning Study
title_fullStr Individual Factors Associated With COVID-19 Infection: A Machine Learning Study
title_full_unstemmed Individual Factors Associated With COVID-19 Infection: A Machine Learning Study
title_short Individual Factors Associated With COVID-19 Infection: A Machine Learning Study
title_sort individual factors associated with covid-19 infection: a machine learning study
topic Public Health
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9279686/
https://www.ncbi.nlm.nih.gov/pubmed/35844896
http://dx.doi.org/10.3389/fpubh.2022.912099
work_keys_str_mv AT ramirezdelrealtania individualfactorsassociatedwithcovid19infectionamachinelearningstudy
AT martinezgarciamireya individualfactorsassociatedwithcovid19infectionamachinelearningstudy
AT marquezmanliof individualfactorsassociatedwithcovid19infectionamachinelearningstudy
AT lopeztrejolaura individualfactorsassociatedwithcovid19infectionamachinelearningstudy
AT gutierrezesparzaguadalupe individualfactorsassociatedwithcovid19infectionamachinelearningstudy
AT hernandezlemusenrique individualfactorsassociatedwithcovid19infectionamachinelearningstudy