Cargando…
Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system
OBJECTIVES: To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs). MATERIALS AND METHODS: We included patien...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10550267/ https://www.ncbi.nlm.nih.gov/pubmed/37799347 http://dx.doi.org/10.1093/jamiaopen/ooad085 |
_version_ | 1785115497935667200 |
---|---|
author | Gray, Geoffrey M Zirikly, Ayah Ahumada, Luis M Rouhizadeh, Masoud Richards, Thomas Kitchen, Christopher Foroughmand, Iman Hatef, Elham |
author_facet | Gray, Geoffrey M Zirikly, Ayah Ahumada, Luis M Rouhizadeh, Masoud Richards, Thomas Kitchen, Christopher Foroughmand, Iman Hatef, Elham |
author_sort | Gray, Geoffrey M |
collection | PubMed |
description | OBJECTIVES: To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs). MATERIALS AND METHODS: We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and F1 score. RESULTS: The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and F1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric. DISCUSSION: The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system. CONCLUSION: The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system. |
format | Online Article Text |
id | pubmed-10550267 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-105502672023-10-05 Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system Gray, Geoffrey M Zirikly, Ayah Ahumada, Luis M Rouhizadeh, Masoud Richards, Thomas Kitchen, Christopher Foroughmand, Iman Hatef, Elham JAMIA Open Research and Applications OBJECTIVES: To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs). MATERIALS AND METHODS: We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and F1 score. RESULTS: The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and F1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric. DISCUSSION: The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system. CONCLUSION: The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system. Oxford University Press 2023-10-04 /pmc/articles/PMC10550267/ /pubmed/37799347 http://dx.doi.org/10.1093/jamiaopen/ooad085 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research and Applications Gray, Geoffrey M Zirikly, Ayah Ahumada, Luis M Rouhizadeh, Masoud Richards, Thomas Kitchen, Christopher Foroughmand, Iman Hatef, Elham Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system |
title | Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system |
title_full | Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system |
title_fullStr | Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system |
title_full_unstemmed | Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system |
title_short | Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system |
title_sort | application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10550267/ https://www.ncbi.nlm.nih.gov/pubmed/37799347 http://dx.doi.org/10.1093/jamiaopen/ooad085 |
work_keys_str_mv | AT graygeoffreym applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem AT ziriklyayah applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem AT ahumadaluism applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem AT rouhizadehmasoud applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem AT richardsthomas applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem AT kitchenchristopher applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem AT foroughmandiman applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem AT hatefelham applicationofnaturallanguageprocessingtoidentifysocialneedsfrompatientmedicalnotesdevelopmentandassessmentofascalableperformantandrulebasedmodelinanintegratedhealthcaredeliverysystem |