Cargando…

Natural Language Processing of Electronic Patient Records to Predict Psychiatric Inpatients at Risk of Early Readmission to Hospital Using Predictive Models Derived Through Machine Learning

AIMS: Psychiatric readmissions cause a burden on the healthcare system, incur a monetary cost and cause additional distress to acutely unwell patients. This project explores the use of the free-text of electronic patient records to predict inpatients in psychiatric hospitals at risk of readmission u...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kapadi, Tarif, Luz, Saturnino
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cambridge University Press 2022
Materias:	Rapid-Fire Presentation
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9378236/ http://dx.doi.org/10.1192/bjo.2022.87

_version_	1784768514246049792
author	Kapadi, Tarif Luz, Saturnino
author_facet	Kapadi, Tarif Luz, Saturnino
author_sort	Kapadi, Tarif
collection	PubMed
description	AIMS: Psychiatric readmissions cause a burden on the healthcare system, incur a monetary cost and cause additional distress to acutely unwell patients. This project explores the use of the free-text of electronic patient records to predict inpatients in psychiatric hospitals at risk of readmission using predictive models generated by machine learning. METHODS: Free-text was extracted from the electronic patient records of patients admitted to hospitals in Birmingham and Solihull Mental Health Foundation Trust (BSMHFT) during the five years 2015–2019 inclusive. The anonymised records were obtained via the CRIS (Clinical Record Interactive Search) database. A total of 17208 records were extracted. The free-text entered by clinicians during an admission was extracted and processed using techniques of natural language processing to generate input vectors suitable to be used with machine learning algorithms. tf-idf (term frequency-inverse document frequency) vectors were used. A selection of algorithms were used to train predictive models. Two-thirds of the records were used as training data with the remainder as test data. Baseline model performance was assessed and then best-performing candidates underwent hyperparameter optimisation using five-fold cross-validation to improve performance. Bayesian optimisation was used to automate hyperparameter tuning during cross-validation. Hyperparameters were optimised on the log loss function. As the dataset was imbalanced with negative instances outnumbering positive instances to a significant degree, various techniques such as random undersampling of negative instances in the training data were used to deal with class imbalance throughout this process. Following cross-validation, the best-performing models underwent performance analysis. Models were used to make predictions on the test data. Performance was assessed using F1-measures, precision-recall curves and the average precision metric (equivalent to area under the precision-recall curve). These metrics were chosen due to their suitability in assessing models trained on imbalanced datasets. RESULTS: The best F1 score obtained was 0.233 using a Random Forest model trained using unigram tf-idf vectors of 500 token dimension. The best average precision obtained was 0.157 using a Support Vector Machine trained using unigram tf-idf vectors of 2000 token dimension. Both the above results required the use of random oversampling of positive instances to improve performance on the imbalanced dataset. CONCLUSION: The performance indicates that the models generated are unlikely to have significant practical utility. Nevertheless, this exploratory project has produced a processed dataset with knowledge about its characteristics. This could be used for the further development of models using more complex techniques such as language modelling using neural networks.
format	Online Article Text
id	pubmed-9378236
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Cambridge University Press
record_format	MEDLINE/PubMed
spelling	pubmed-93782362022-08-18 Natural Language Processing of Electronic Patient Records to Predict Psychiatric Inpatients at Risk of Early Readmission to Hospital Using Predictive Models Derived Through Machine Learning Kapadi, Tarif Luz, Saturnino BJPsych Open Rapid-Fire Presentation AIMS: Psychiatric readmissions cause a burden on the healthcare system, incur a monetary cost and cause additional distress to acutely unwell patients. This project explores the use of the free-text of electronic patient records to predict inpatients in psychiatric hospitals at risk of readmission using predictive models generated by machine learning. METHODS: Free-text was extracted from the electronic patient records of patients admitted to hospitals in Birmingham and Solihull Mental Health Foundation Trust (BSMHFT) during the five years 2015–2019 inclusive. The anonymised records were obtained via the CRIS (Clinical Record Interactive Search) database. A total of 17208 records were extracted. The free-text entered by clinicians during an admission was extracted and processed using techniques of natural language processing to generate input vectors suitable to be used with machine learning algorithms. tf-idf (term frequency-inverse document frequency) vectors were used. A selection of algorithms were used to train predictive models. Two-thirds of the records were used as training data with the remainder as test data. Baseline model performance was assessed and then best-performing candidates underwent hyperparameter optimisation using five-fold cross-validation to improve performance. Bayesian optimisation was used to automate hyperparameter tuning during cross-validation. Hyperparameters were optimised on the log loss function. As the dataset was imbalanced with negative instances outnumbering positive instances to a significant degree, various techniques such as random undersampling of negative instances in the training data were used to deal with class imbalance throughout this process. Following cross-validation, the best-performing models underwent performance analysis. Models were used to make predictions on the test data. Performance was assessed using F1-measures, precision-recall curves and the average precision metric (equivalent to area under the precision-recall curve). These metrics were chosen due to their suitability in assessing models trained on imbalanced datasets. RESULTS: The best F1 score obtained was 0.233 using a Random Forest model trained using unigram tf-idf vectors of 500 token dimension. The best average precision obtained was 0.157 using a Support Vector Machine trained using unigram tf-idf vectors of 2000 token dimension. Both the above results required the use of random oversampling of positive instances to improve performance on the imbalanced dataset. CONCLUSION: The performance indicates that the models generated are unlikely to have significant practical utility. Nevertheless, this exploratory project has produced a processed dataset with knowledge about its characteristics. This could be used for the further development of models using more complex techniques such as language modelling using neural networks. Cambridge University Press 2022-06-20 /pmc/articles/PMC9378236/ http://dx.doi.org/10.1192/bjo.2022.87 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Rapid-Fire Presentation Kapadi, Tarif Luz, Saturnino Natural Language Processing of Electronic Patient Records to Predict Psychiatric Inpatients at Risk of Early Readmission to Hospital Using Predictive Models Derived Through Machine Learning
title	Natural Language Processing of Electronic Patient Records to Predict Psychiatric Inpatients at Risk of Early Readmission to Hospital Using Predictive Models Derived Through Machine Learning
title_full	Natural Language Processing of Electronic Patient Records to Predict Psychiatric Inpatients at Risk of Early Readmission to Hospital Using Predictive Models Derived Through Machine Learning
title_fullStr	Natural Language Processing of Electronic Patient Records to Predict Psychiatric Inpatients at Risk of Early Readmission to Hospital Using Predictive Models Derived Through Machine Learning
title_full_unstemmed	Natural Language Processing of Electronic Patient Records to Predict Psychiatric Inpatients at Risk of Early Readmission to Hospital Using Predictive Models Derived Through Machine Learning
title_short	Natural Language Processing of Electronic Patient Records to Predict Psychiatric Inpatients at Risk of Early Readmission to Hospital Using Predictive Models Derived Through Machine Learning
title_sort	natural language processing of electronic patient records to predict psychiatric inpatients at risk of early readmission to hospital using predictive models derived through machine learning
topic	Rapid-Fire Presentation
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9378236/ http://dx.doi.org/10.1192/bjo.2022.87
work_keys_str_mv	AT kapaditarif naturallanguageprocessingofelectronicpatientrecordstopredictpsychiatricinpatientsatriskofearlyreadmissiontohospitalusingpredictivemodelsderivedthroughmachinelearning AT luzsaturnino naturallanguageprocessingofelectronicpatientrecordstopredictpsychiatricinpatientsatriskofearlyreadmissiontohospitalusingpredictivemodelsderivedthroughmachinelearning

Natural Language Processing of Electronic Patient Records to Predict Psychiatric Inpatients at Risk of Early Readmission to Hospital Using Predictive Models Derived Through Machine Learning

Ejemplares similares