Cargando…

Data analytics and clinical feature ranking of medical records of patients with sepsis

BACKGROUND: Sepsis is a life-threatening clinical condition that happens when the patient’s body has an excessive reaction to an infection, and should be treated in one hour. Due to the urgency of sepsis, doctors and physicians often do not have enough time to perform laboratory tests and analyses t...

Descripción completa

Detalles Bibliográficos
Autores principales: Chicco, Davide, Oneto, Luca
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7860202/
https://www.ncbi.nlm.nih.gov/pubmed/33536030
http://dx.doi.org/10.1186/s13040-021-00235-0
_version_ 1783646892727468032
author Chicco, Davide
Oneto, Luca
author_facet Chicco, Davide
Oneto, Luca
author_sort Chicco, Davide
collection PubMed
description BACKGROUND: Sepsis is a life-threatening clinical condition that happens when the patient’s body has an excessive reaction to an infection, and should be treated in one hour. Due to the urgency of sepsis, doctors and physicians often do not have enough time to perform laboratory tests and analyses to help them forecast the consequences of the sepsis episode. In this context, machine learning can provide a fast computational prediction of sepsis severity, patient survival, and sequential organ failure by just analyzing the electronic health records of the patients. Also, machine learning can be employed to understand which features in the medical records are more predictive of sepsis severity, of patient survival, and of sequential organ failure in a fast and non-invasive way. DATASET AND METHODS: In this study, we analyzed a dataset of electronic health records of 364 patients collected between 2014 and 2016. The medical record of each patient has 29 clinical features, and includes a binary value for survival, a binary value for septic shock, and a numerical value for the sequential organ failure assessment (SOFA) score. We disjointly utilized each of these three factors as an independent target, and employed several machine learning methods to predict it (binary classifiers for survival and septic shock, and regression analysis for the SOFA score). Afterwards, we used a data mining approach to identify the most important dataset features in relation to each of the three targets separately, and compared these results with the results achieved through a standard biostatistics approach. RESULTS AND CONCLUSIONS: Our results showed that machine learning can be employed efficiently to predict septic shock, SOFA score, and survival of patients diagnoses with sepsis, from their electronic health records data. And regarding clinical feature ranking, our results showed that Random Forests feature selection identified several unexpected symptoms and clinical components as relevant for septic shock, SOFA score, and survival. These discoveries can help doctors and physicians in understanding and predicting septic shock. We made the analyzed dataset and our developed software code publicly available online. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13040-021-00235-0).
format Online
Article
Text
id pubmed-7860202
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78602022021-02-05 Data analytics and clinical feature ranking of medical records of patients with sepsis Chicco, Davide Oneto, Luca BioData Min Research BACKGROUND: Sepsis is a life-threatening clinical condition that happens when the patient’s body has an excessive reaction to an infection, and should be treated in one hour. Due to the urgency of sepsis, doctors and physicians often do not have enough time to perform laboratory tests and analyses to help them forecast the consequences of the sepsis episode. In this context, machine learning can provide a fast computational prediction of sepsis severity, patient survival, and sequential organ failure by just analyzing the electronic health records of the patients. Also, machine learning can be employed to understand which features in the medical records are more predictive of sepsis severity, of patient survival, and of sequential organ failure in a fast and non-invasive way. DATASET AND METHODS: In this study, we analyzed a dataset of electronic health records of 364 patients collected between 2014 and 2016. The medical record of each patient has 29 clinical features, and includes a binary value for survival, a binary value for septic shock, and a numerical value for the sequential organ failure assessment (SOFA) score. We disjointly utilized each of these three factors as an independent target, and employed several machine learning methods to predict it (binary classifiers for survival and septic shock, and regression analysis for the SOFA score). Afterwards, we used a data mining approach to identify the most important dataset features in relation to each of the three targets separately, and compared these results with the results achieved through a standard biostatistics approach. RESULTS AND CONCLUSIONS: Our results showed that machine learning can be employed efficiently to predict septic shock, SOFA score, and survival of patients diagnoses with sepsis, from their electronic health records data. And regarding clinical feature ranking, our results showed that Random Forests feature selection identified several unexpected symptoms and clinical components as relevant for septic shock, SOFA score, and survival. These discoveries can help doctors and physicians in understanding and predicting septic shock. We made the analyzed dataset and our developed software code publicly available online. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13040-021-00235-0). BioMed Central 2021-02-03 /pmc/articles/PMC7860202/ /pubmed/33536030 http://dx.doi.org/10.1186/s13040-021-00235-0 Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Chicco, Davide
Oneto, Luca
Data analytics and clinical feature ranking of medical records of patients with sepsis
title Data analytics and clinical feature ranking of medical records of patients with sepsis
title_full Data analytics and clinical feature ranking of medical records of patients with sepsis
title_fullStr Data analytics and clinical feature ranking of medical records of patients with sepsis
title_full_unstemmed Data analytics and clinical feature ranking of medical records of patients with sepsis
title_short Data analytics and clinical feature ranking of medical records of patients with sepsis
title_sort data analytics and clinical feature ranking of medical records of patients with sepsis
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7860202/
https://www.ncbi.nlm.nih.gov/pubmed/33536030
http://dx.doi.org/10.1186/s13040-021-00235-0
work_keys_str_mv AT chiccodavide dataanalyticsandclinicalfeaturerankingofmedicalrecordsofpatientswithsepsis
AT onetoluca dataanalyticsandclinicalfeaturerankingofmedicalrecordsofpatientswithsepsis