Cargando…

Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system

OBJECTIVES: To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs). MATERIALS AND METHODS: We included patien...

Descripción completa

Detalles Bibliográficos
Autores principales: Gray, Geoffrey M, Zirikly, Ayah, Ahumada, Luis M, Rouhizadeh, Masoud, Richards, Thomas, Kitchen, Christopher, Foroughmand, Iman, Hatef, Elham
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10550267/
https://www.ncbi.nlm.nih.gov/pubmed/37799347
http://dx.doi.org/10.1093/jamiaopen/ooad085
_version_ 1785115497935667200
author Gray, Geoffrey M
Zirikly, Ayah
Ahumada, Luis M
Rouhizadeh, Masoud
Richards, Thomas
Kitchen, Christopher
Foroughmand, Iman
Hatef, Elham
author_facet Gray, Geoffrey M
Zirikly, Ayah
Ahumada, Luis M
Rouhizadeh, Masoud
Richards, Thomas
Kitchen, Christopher
Foroughmand, Iman
Hatef, Elham
author_sort Gray, Geoffrey M
collection PubMed
description OBJECTIVES: To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs). MATERIALS AND METHODS: We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and F1 score. RESULTS: The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and F1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric. DISCUSSION: The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system. CONCLUSION: The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system.
format Online
Article
Text
id pubmed-10550267
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105502672023-10-05 Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system Gray, Geoffrey M Zirikly, Ayah Ahumada, Luis M Rouhizadeh, Masoud Richards, Thomas Kitchen, Christopher Foroughmand, Iman Hatef, Elham JAMIA Open Research and Applications OBJECTIVES: To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs). MATERIALS AND METHODS: We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and F1 score. RESULTS: The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and F1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric. DISCUSSION: The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system. CONCLUSION: The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system. Oxford University Press 2023-10-04 /pmc/articles/PMC10550267/ /pubmed/37799347 http://dx.doi.org/10.1093/jamiaopen/ooad085 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Gray, Geoffrey M
Zirikly, Ayah
Ahumada, Luis M
Rouhizadeh, Masoud
Richards, Thomas
Kitchen, Christopher
Foroughmand, Iman
Hatef, Elham
Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system
title Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system
title_full Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system
title_fullStr Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system
title_full_unstemmed Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system
title_short Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system
title_sort application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10550267/
https://www.ncbi.nlm.nih.gov/pubmed/37799347
http://dx.doi.org/10.1093/jamiaopen/ooad085
work_keys_str_mv AT graygeoffreym applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem
AT ziriklyayah applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem
AT ahumadaluism applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem
AT rouhizadehmasoud applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem
AT richardsthomas applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem
AT kitchenchristopher applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem
AT foroughmandiman applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem
AT hatefelham applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem