Cargando…

The prediction of hospital length of stay using unstructured data

OBJECTIVE: This study aimed to assess the performance improvement for machine learning-based hospital length of stay (LOS) predictions when clinical signs written in text are accounted for and compared to the traditional approach of solely considering structured information such as age, gender and m...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chrusciel, Jan, Girardon, François, Roquette, Lucien, Laplanche, David, Duclos, Antoine, Sanchez, Stéphane
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8684269/ https://www.ncbi.nlm.nih.gov/pubmed/34922532 http://dx.doi.org/10.1186/s12911-021-01722-4

_version_	1784617584979607552
author	Chrusciel, Jan Girardon, François Roquette, Lucien Laplanche, David Duclos, Antoine Sanchez, Stéphane
author_facet	Chrusciel, Jan Girardon, François Roquette, Lucien Laplanche, David Duclos, Antoine Sanchez, Stéphane
author_sort	Chrusciel, Jan
collection	PubMed
description	OBJECTIVE: This study aimed to assess the performance improvement for machine learning-based hospital length of stay (LOS) predictions when clinical signs written in text are accounted for and compared to the traditional approach of solely considering structured information such as age, gender and major ICD diagnosis. METHODS: This study was an observational retrospective cohort study and analyzed patient stays admitted between 1 January to 24 September 2019. For each stay, a patient was admitted through the Emergency Department (ED) and stayed for more than two days in the subsequent service. LOS was predicted using two random forest models. The first included unstructured text extracted from electronic health records (EHRs). A word-embedding algorithm based on UMLS terminology with exact matching restricted to patient-centric affirmation sentences was used to assess the EHR data. The second model was primarily based on structured data in the form of diagnoses coded from the International Classification of Disease 10th Edition (ICD-10) and triage codes (CCMU/GEMSA classifications). Variables common to both models were: age, gender, zip/postal code, LOS in the ED, recent visit flag, assigned patient ward after the ED stay and short-term ED activity. Models were trained on 80% of data and performance was evaluated by accuracy on the remaining 20% test data. RESULTS: The model using unstructured data had a 75.0% accuracy compared to 74.1% for the model containing structured data. The two models produced a similar prediction in 86.6% of cases. In a secondary analysis restricted to intensive care patients, the accuracy of both models was also similar (76.3% vs 75.0%). CONCLUSIONS: LOS prediction using unstructured data had similar accuracy to using structured data and can be considered of use to accurately model LOS. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01722-4.
format	Online Article Text
id	pubmed-8684269
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-86842692021-12-20 The prediction of hospital length of stay using unstructured data Chrusciel, Jan Girardon, François Roquette, Lucien Laplanche, David Duclos, Antoine Sanchez, Stéphane BMC Med Inform Decis Mak Research OBJECTIVE: This study aimed to assess the performance improvement for machine learning-based hospital length of stay (LOS) predictions when clinical signs written in text are accounted for and compared to the traditional approach of solely considering structured information such as age, gender and major ICD diagnosis. METHODS: This study was an observational retrospective cohort study and analyzed patient stays admitted between 1 January to 24 September 2019. For each stay, a patient was admitted through the Emergency Department (ED) and stayed for more than two days in the subsequent service. LOS was predicted using two random forest models. The first included unstructured text extracted from electronic health records (EHRs). A word-embedding algorithm based on UMLS terminology with exact matching restricted to patient-centric affirmation sentences was used to assess the EHR data. The second model was primarily based on structured data in the form of diagnoses coded from the International Classification of Disease 10th Edition (ICD-10) and triage codes (CCMU/GEMSA classifications). Variables common to both models were: age, gender, zip/postal code, LOS in the ED, recent visit flag, assigned patient ward after the ED stay and short-term ED activity. Models were trained on 80% of data and performance was evaluated by accuracy on the remaining 20% test data. RESULTS: The model using unstructured data had a 75.0% accuracy compared to 74.1% for the model containing structured data. The two models produced a similar prediction in 86.6% of cases. In a secondary analysis restricted to intensive care patients, the accuracy of both models was also similar (76.3% vs 75.0%). CONCLUSIONS: LOS prediction using unstructured data had similar accuracy to using structured data and can be considered of use to accurately model LOS. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01722-4. BioMed Central 2021-12-18 /pmc/articles/PMC8684269/ /pubmed/34922532 http://dx.doi.org/10.1186/s12911-021-01722-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Chrusciel, Jan Girardon, François Roquette, Lucien Laplanche, David Duclos, Antoine Sanchez, Stéphane The prediction of hospital length of stay using unstructured data
title	The prediction of hospital length of stay using unstructured data
title_full	The prediction of hospital length of stay using unstructured data
title_fullStr	The prediction of hospital length of stay using unstructured data
title_full_unstemmed	The prediction of hospital length of stay using unstructured data
title_short	The prediction of hospital length of stay using unstructured data
title_sort	prediction of hospital length of stay using unstructured data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8684269/ https://www.ncbi.nlm.nih.gov/pubmed/34922532 http://dx.doi.org/10.1186/s12911-021-01722-4
work_keys_str_mv	AT chruscieljan thepredictionofhospitallengthofstayusingunstructureddata AT girardonfrancois thepredictionofhospitallengthofstayusingunstructureddata AT roquettelucien thepredictionofhospitallengthofstayusingunstructureddata AT laplanchedavid thepredictionofhospitallengthofstayusingunstructureddata AT duclosantoine thepredictionofhospitallengthofstayusingunstructureddata AT sanchezstephane thepredictionofhospitallengthofstayusingunstructureddata AT chruscieljan predictionofhospitallengthofstayusingunstructureddata AT girardonfrancois predictionofhospitallengthofstayusingunstructureddata AT roquettelucien predictionofhospitallengthofstayusingunstructureddata AT laplanchedavid predictionofhospitallengthofstayusingunstructureddata AT duclosantoine predictionofhospitallengthofstayusingunstructureddata AT sanchezstephane predictionofhospitallengthofstayusingunstructureddata

The prediction of hospital length of stay using unstructured data

Ejemplares similares