Cargando…

Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing

BACKGROUND: Medical notes are a rich source of patient data; however, the nature of unstructured text has largely precluded the use of these data for large retrospective analyses. Transforming clinical text into structured data can enable large-scale research studies with electronic health records (...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fernandes, Marta, Sun, Haoqi, Jain, Aayushee, Alabsi, Haitham S, Brenner, Laura N, Ye, Elissa, Ge, Wendong, Collens, Sarah I, Leone, Michael J, Das, Sudeshna, Robbins, Gregory K, Mukerji, Shibani S, Westover, M Brandon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7879729/ https://www.ncbi.nlm.nih.gov/pubmed/33449908 http://dx.doi.org/10.2196/25457

_version_	1783650570073014272
author	Fernandes, Marta Sun, Haoqi Jain, Aayushee Alabsi, Haitham S Brenner, Laura N Ye, Elissa Ge, Wendong Collens, Sarah I Leone, Michael J Das, Sudeshna Robbins, Gregory K Mukerji, Shibani S Westover, M Brandon
author_facet	Fernandes, Marta Sun, Haoqi Jain, Aayushee Alabsi, Haitham S Brenner, Laura N Ye, Elissa Ge, Wendong Collens, Sarah I Leone, Michael J Das, Sudeshna Robbins, Gregory K Mukerji, Shibani S Westover, M Brandon
author_sort	Fernandes, Marta
collection	PubMed
description	BACKGROUND: Medical notes are a rich source of patient data; however, the nature of unstructured text has largely precluded the use of these data for large retrospective analyses. Transforming clinical text into structured data can enable large-scale research studies with electronic health records (EHR) data. Natural language processing (NLP) can be used for text information retrieval, reducing the need for labor-intensive chart review. Here we present an application of NLP to large-scale analysis of medical records at 2 large hospitals for patients hospitalized with COVID-19. OBJECTIVE: Our study goal was to develop an NLP pipeline to classify the discharge disposition (home, inpatient rehabilitation, skilled nursing inpatient facility [SNIF], and death) of patients hospitalized with COVID-19 based on hospital discharge summary notes. METHODS: Text mining and feature engineering were applied to unstructured text from hospital discharge summaries. The study included patients with COVID-19 discharged from 2 hospitals in the Boston, Massachusetts area (Massachusetts General Hospital and Brigham and Women’s Hospital) between March 10, 2020, and June 30, 2020. The data were divided into a training set (70%) and hold-out test set (30%). Discharge summaries were represented as bags-of-words consisting of single words (unigrams), bigrams, and trigrams. The number of features was reduced during training by excluding n-grams that occurred in fewer than 10% of discharge summaries, and further reduced using least absolute shrinkage and selection operator (LASSO) regularization while training a multiclass logistic regression model. Model performance was evaluated using the hold-out test set. RESULTS: The study cohort included 1737 adult patients (median age 61 [SD 18] years; 55% men; 45% White and 16% Black; 14% nonsurvivors and 61% discharged home). The model selected 179 from a vocabulary of 1056 engineered features, consisting of combinations of unigrams, bigrams, and trigrams. The top features contributing most to the classification by the model (for each outcome) were the following: “appointments specialty,” “home health,” and “home care” (home); “intubate” and “ARDS” (inpatient rehabilitation); “service” (SNIF); “brief assessment” and “covid” (death). The model achieved a micro-average area under the receiver operating characteristic curve value of 0.98 (95% CI 0.97-0.98) and average precision of 0.81 (95% CI 0.75-0.84) in the testing set for prediction of discharge disposition. CONCLUSIONS: A supervised learning–based NLP approach is able to classify the discharge disposition of patients hospitalized with COVID-19. This approach has the potential to accelerate and increase the scale of research on patients’ discharge disposition that is possible with EHR data.
format	Online Article Text
id	pubmed-7879729
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-78797292021-02-23 Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing Fernandes, Marta Sun, Haoqi Jain, Aayushee Alabsi, Haitham S Brenner, Laura N Ye, Elissa Ge, Wendong Collens, Sarah I Leone, Michael J Das, Sudeshna Robbins, Gregory K Mukerji, Shibani S Westover, M Brandon JMIR Med Inform Original Paper BACKGROUND: Medical notes are a rich source of patient data; however, the nature of unstructured text has largely precluded the use of these data for large retrospective analyses. Transforming clinical text into structured data can enable large-scale research studies with electronic health records (EHR) data. Natural language processing (NLP) can be used for text information retrieval, reducing the need for labor-intensive chart review. Here we present an application of NLP to large-scale analysis of medical records at 2 large hospitals for patients hospitalized with COVID-19. OBJECTIVE: Our study goal was to develop an NLP pipeline to classify the discharge disposition (home, inpatient rehabilitation, skilled nursing inpatient facility [SNIF], and death) of patients hospitalized with COVID-19 based on hospital discharge summary notes. METHODS: Text mining and feature engineering were applied to unstructured text from hospital discharge summaries. The study included patients with COVID-19 discharged from 2 hospitals in the Boston, Massachusetts area (Massachusetts General Hospital and Brigham and Women’s Hospital) between March 10, 2020, and June 30, 2020. The data were divided into a training set (70%) and hold-out test set (30%). Discharge summaries were represented as bags-of-words consisting of single words (unigrams), bigrams, and trigrams. The number of features was reduced during training by excluding n-grams that occurred in fewer than 10% of discharge summaries, and further reduced using least absolute shrinkage and selection operator (LASSO) regularization while training a multiclass logistic regression model. Model performance was evaluated using the hold-out test set. RESULTS: The study cohort included 1737 adult patients (median age 61 [SD 18] years; 55% men; 45% White and 16% Black; 14% nonsurvivors and 61% discharged home). The model selected 179 from a vocabulary of 1056 engineered features, consisting of combinations of unigrams, bigrams, and trigrams. The top features contributing most to the classification by the model (for each outcome) were the following: “appointments specialty,” “home health,” and “home care” (home); “intubate” and “ARDS” (inpatient rehabilitation); “service” (SNIF); “brief assessment” and “covid” (death). The model achieved a micro-average area under the receiver operating characteristic curve value of 0.98 (95% CI 0.97-0.98) and average precision of 0.81 (95% CI 0.75-0.84) in the testing set for prediction of discharge disposition. CONCLUSIONS: A supervised learning–based NLP approach is able to classify the discharge disposition of patients hospitalized with COVID-19. This approach has the potential to accelerate and increase the scale of research on patients’ discharge disposition that is possible with EHR data. JMIR Publications 2021-02-10 /pmc/articles/PMC7879729/ /pubmed/33449908 http://dx.doi.org/10.2196/25457 Text en ©Marta Fernandes, Haoqi Sun, Aayushee Jain, Haitham S Alabsi, Laura N Brenner, Elissa Ye, Wendong Ge, Sarah I Collens, Michael J Leone, Sudeshna Das, Gregory K Robbins, Shibani S Mukerji, M Brandon Westover. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 10.02.2021. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Fernandes, Marta Sun, Haoqi Jain, Aayushee Alabsi, Haitham S Brenner, Laura N Ye, Elissa Ge, Wendong Collens, Sarah I Leone, Michael J Das, Sudeshna Robbins, Gregory K Mukerji, Shibani S Westover, M Brandon Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing
title	Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing
title_full	Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing
title_fullStr	Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing
title_full_unstemmed	Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing
title_short	Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing
title_sort	classification of the disposition of patients hospitalized with covid-19: reading discharge summaries using natural language processing
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7879729/ https://www.ncbi.nlm.nih.gov/pubmed/33449908 http://dx.doi.org/10.2196/25457
work_keys_str_mv	AT fernandesmarta classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT sunhaoqi classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT jainaayushee classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT alabsihaithams classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT brennerlauran classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT yeelissa classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT gewendong classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT collenssarahi classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT leonemichaelj classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT dassudeshna classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT robbinsgregoryk classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT mukerjishibanis classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing AT westovermbrandon classificationofthedispositionofpatientshospitalizedwithcovid19readingdischargesummariesusingnaturallanguageprocessing

Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing

Ejemplares similares