Cargando…
Individual Factors Associated With COVID-19 Infection: A Machine Learning Study
The fast, exponential increase of COVID-19 infections and their catastrophic effects on patients' health have required the development of tools that support health systems in the quick and efficient diagnosis and prognosis of this disease. In this context, the present study aims to identify the...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9279686/ https://www.ncbi.nlm.nih.gov/pubmed/35844896 http://dx.doi.org/10.3389/fpubh.2022.912099 |
_version_ | 1784746453995880448 |
---|---|
author | Ramírez-del Real, Tania Martínez-García, Mireya Márquez, Manlio F. López-Trejo, Laura Gutiérrez-Esparza, Guadalupe Hernández-Lemus, Enrique |
author_facet | Ramírez-del Real, Tania Martínez-García, Mireya Márquez, Manlio F. López-Trejo, Laura Gutiérrez-Esparza, Guadalupe Hernández-Lemus, Enrique |
author_sort | Ramírez-del Real, Tania |
collection | PubMed |
description | The fast, exponential increase of COVID-19 infections and their catastrophic effects on patients' health have required the development of tools that support health systems in the quick and efficient diagnosis and prognosis of this disease. In this context, the present study aims to identify the potential factors associated with COVID-19 infections, applying machine learning techniques, particularly random forest, chi-squared, xgboost, and rpart for feature selection; ROSE and SMOTE were used as resampling methods due to the existence of class imbalance. Similarly, machine and deep learning algorithms such as support vector machines, C4.5, random forest, rpart, and deep neural networks were explored during the train/test phase to select the best prediction model. The dataset used in this study contains clinical data, anthropometric measurements, and other health parameters related to smoking habits, alcohol consumption, quality of sleep, physical activity, and health status during confinement due to the pandemic associated with COVID-19. The results showed that the XGBoost model got the best features associated with COVID-19 infection, and random forest approximated the best predictive model with a balanced accuracy of 90.41% using SMOTE as a resampling technique. The model with the best performance provides a tool to help prevent contracting SARS-CoV-2 since the variables with the highest risk factor are detected, and some of them are, to a certain extent controllable. |
format | Online Article Text |
id | pubmed-9279686 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-92796862022-07-15 Individual Factors Associated With COVID-19 Infection: A Machine Learning Study Ramírez-del Real, Tania Martínez-García, Mireya Márquez, Manlio F. López-Trejo, Laura Gutiérrez-Esparza, Guadalupe Hernández-Lemus, Enrique Front Public Health Public Health The fast, exponential increase of COVID-19 infections and their catastrophic effects on patients' health have required the development of tools that support health systems in the quick and efficient diagnosis and prognosis of this disease. In this context, the present study aims to identify the potential factors associated with COVID-19 infections, applying machine learning techniques, particularly random forest, chi-squared, xgboost, and rpart for feature selection; ROSE and SMOTE were used as resampling methods due to the existence of class imbalance. Similarly, machine and deep learning algorithms such as support vector machines, C4.5, random forest, rpart, and deep neural networks were explored during the train/test phase to select the best prediction model. The dataset used in this study contains clinical data, anthropometric measurements, and other health parameters related to smoking habits, alcohol consumption, quality of sleep, physical activity, and health status during confinement due to the pandemic associated with COVID-19. The results showed that the XGBoost model got the best features associated with COVID-19 infection, and random forest approximated the best predictive model with a balanced accuracy of 90.41% using SMOTE as a resampling technique. The model with the best performance provides a tool to help prevent contracting SARS-CoV-2 since the variables with the highest risk factor are detected, and some of them are, to a certain extent controllable. Frontiers Media S.A. 2022-06-30 /pmc/articles/PMC9279686/ /pubmed/35844896 http://dx.doi.org/10.3389/fpubh.2022.912099 Text en Copyright © 2022 Ramírez-del Real, Martínez-García, Márquez, López-Trejo, Gutiérrez-Esparza and Hernández-Lemus. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Public Health Ramírez-del Real, Tania Martínez-García, Mireya Márquez, Manlio F. López-Trejo, Laura Gutiérrez-Esparza, Guadalupe Hernández-Lemus, Enrique Individual Factors Associated With COVID-19 Infection: A Machine Learning Study |
title | Individual Factors Associated With COVID-19 Infection: A Machine Learning Study |
title_full | Individual Factors Associated With COVID-19 Infection: A Machine Learning Study |
title_fullStr | Individual Factors Associated With COVID-19 Infection: A Machine Learning Study |
title_full_unstemmed | Individual Factors Associated With COVID-19 Infection: A Machine Learning Study |
title_short | Individual Factors Associated With COVID-19 Infection: A Machine Learning Study |
title_sort | individual factors associated with covid-19 infection: a machine learning study |
topic | Public Health |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9279686/ https://www.ncbi.nlm.nih.gov/pubmed/35844896 http://dx.doi.org/10.3389/fpubh.2022.912099 |
work_keys_str_mv | AT ramirezdelrealtania individualfactorsassociatedwithcovid19infectionamachinelearningstudy AT martinezgarciamireya individualfactorsassociatedwithcovid19infectionamachinelearningstudy AT marquezmanliof individualfactorsassociatedwithcovid19infectionamachinelearningstudy AT lopeztrejolaura individualfactorsassociatedwithcovid19infectionamachinelearningstudy AT gutierrezesparzaguadalupe individualfactorsassociatedwithcovid19infectionamachinelearningstudy AT hernandezlemusenrique individualfactorsassociatedwithcovid19infectionamachinelearningstudy |