Cargando…

Development and assessment of a natural language processing model to identify residential instability in electronic health records’ unstructured data: a comparison of 3 integrated healthcare delivery systems

OBJECTIVE: To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems. MATERIALS AND METHODS: We in...

Descripción completa

Detalles Bibliográficos
Autores principales: Hatef, Elham, Rouhizadeh, Masoud, Nau, Claudia, Xie, Fagen, Rouillard, Christopher, Abu-Nasser, Mahmoud, Padilla, Ariadna, Lyons, Lindsay Joe, Kharrazi, Hadi, Weiner, Jonathan P, Roblin, Douglas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8867582/
https://www.ncbi.nlm.nih.gov/pubmed/35224458
http://dx.doi.org/10.1093/jamiaopen/ooac006
_version_ 1784656083433816064
author Hatef, Elham
Rouhizadeh, Masoud
Nau, Claudia
Xie, Fagen
Rouillard, Christopher
Abu-Nasser, Mahmoud
Padilla, Ariadna
Lyons, Lindsay Joe
Kharrazi, Hadi
Weiner, Jonathan P
Roblin, Douglas
author_facet Hatef, Elham
Rouhizadeh, Masoud
Nau, Claudia
Xie, Fagen
Rouillard, Christopher
Abu-Nasser, Mahmoud
Padilla, Ariadna
Lyons, Lindsay Joe
Kharrazi, Hadi
Weiner, Jonathan P
Roblin, Douglas
author_sort Hatef, Elham
collection PubMed
description OBJECTIVE: To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems. MATERIALS AND METHODS: We included patients 18 years and older who received care at 1 of 3 healthcare systems from 2016 through 2020 and had at least 1 free-text note in the EHR during this period. We conducted the study independently; the NLP algorithm logic and method of validity assessment were identical across sites. The approach to the development of the gold standard for assessment of validity differed across sites. Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule-based NLP system made up of expert-developed patterns indicating residential instability at the lead site and enriched the NLP system using insight gained from its application at the other 2 sites. We adapted the algorithm at each site then validated the algorithm using a split-sample approach. We assessed the performance of the algorithm by measures of positive predictive value (precision), sensitivity (recall), and specificity. RESULTS: The NLP algorithm performed with moderate precision (0.45, 0.73, and 1.0) at 3 sites. The sensitivity and specificity of the NLP algorithm varied across 3 sites (sensitivity: 0.68, 0.85, and 0.96; specificity: 0.69, 0.89, and 1.0). DISCUSSION: The performance of this NLP algorithm to identify residential instability in 3 different healthcare systems suggests the algorithm is generally valid and applicable in other healthcare systems with similar EHRs. CONCLUSION: The NLP approach developed in this project is adaptable and can be modified to extract types of social needs other than residential instability from EHRs across different healthcare systems.
format Online
Article
Text
id pubmed-8867582
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-88675822022-02-25 Development and assessment of a natural language processing model to identify residential instability in electronic health records’ unstructured data: a comparison of 3 integrated healthcare delivery systems Hatef, Elham Rouhizadeh, Masoud Nau, Claudia Xie, Fagen Rouillard, Christopher Abu-Nasser, Mahmoud Padilla, Ariadna Lyons, Lindsay Joe Kharrazi, Hadi Weiner, Jonathan P Roblin, Douglas JAMIA Open Research and Applications OBJECTIVE: To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems. MATERIALS AND METHODS: We included patients 18 years and older who received care at 1 of 3 healthcare systems from 2016 through 2020 and had at least 1 free-text note in the EHR during this period. We conducted the study independently; the NLP algorithm logic and method of validity assessment were identical across sites. The approach to the development of the gold standard for assessment of validity differed across sites. Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule-based NLP system made up of expert-developed patterns indicating residential instability at the lead site and enriched the NLP system using insight gained from its application at the other 2 sites. We adapted the algorithm at each site then validated the algorithm using a split-sample approach. We assessed the performance of the algorithm by measures of positive predictive value (precision), sensitivity (recall), and specificity. RESULTS: The NLP algorithm performed with moderate precision (0.45, 0.73, and 1.0) at 3 sites. The sensitivity and specificity of the NLP algorithm varied across 3 sites (sensitivity: 0.68, 0.85, and 0.96; specificity: 0.69, 0.89, and 1.0). DISCUSSION: The performance of this NLP algorithm to identify residential instability in 3 different healthcare systems suggests the algorithm is generally valid and applicable in other healthcare systems with similar EHRs. CONCLUSION: The NLP approach developed in this project is adaptable and can be modified to extract types of social needs other than residential instability from EHRs across different healthcare systems. Oxford University Press 2022-02-16 /pmc/articles/PMC8867582/ /pubmed/35224458 http://dx.doi.org/10.1093/jamiaopen/ooac006 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research and Applications
Hatef, Elham
Rouhizadeh, Masoud
Nau, Claudia
Xie, Fagen
Rouillard, Christopher
Abu-Nasser, Mahmoud
Padilla, Ariadna
Lyons, Lindsay Joe
Kharrazi, Hadi
Weiner, Jonathan P
Roblin, Douglas
Development and assessment of a natural language processing model to identify residential instability in electronic health records’ unstructured data: a comparison of 3 integrated healthcare delivery systems
title Development and assessment of a natural language processing model to identify residential instability in electronic health records’ unstructured data: a comparison of 3 integrated healthcare delivery systems
title_full Development and assessment of a natural language processing model to identify residential instability in electronic health records’ unstructured data: a comparison of 3 integrated healthcare delivery systems
title_fullStr Development and assessment of a natural language processing model to identify residential instability in electronic health records’ unstructured data: a comparison of 3 integrated healthcare delivery systems
title_full_unstemmed Development and assessment of a natural language processing model to identify residential instability in electronic health records’ unstructured data: a comparison of 3 integrated healthcare delivery systems
title_short Development and assessment of a natural language processing model to identify residential instability in electronic health records’ unstructured data: a comparison of 3 integrated healthcare delivery systems
title_sort development and assessment of a natural language processing model to identify residential instability in electronic health records’ unstructured data: a comparison of 3 integrated healthcare delivery systems
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8867582/
https://www.ncbi.nlm.nih.gov/pubmed/35224458
http://dx.doi.org/10.1093/jamiaopen/ooac006
work_keys_str_mv AT hatefelham developmentandassessmentofanaturallanguageprocessingmodeltoidentifyresidentialinstabilityinelectronichealthrecordsunstructureddataacomparisonof3integratedhealthcaredeliverysystems
AT rouhizadehmasoud developmentandassessmentofanaturallanguageprocessingmodeltoidentifyresidentialinstabilityinelectronichealthrecordsunstructureddataacomparisonof3integratedhealthcaredeliverysystems
AT nauclaudia developmentandassessmentofanaturallanguageprocessingmodeltoidentifyresidentialinstabilityinelectronichealthrecordsunstructureddataacomparisonof3integratedhealthcaredeliverysystems
AT xiefagen developmentandassessmentofanaturallanguageprocessingmodeltoidentifyresidentialinstabilityinelectronichealthrecordsunstructureddataacomparisonof3integratedhealthcaredeliverysystems
AT rouillardchristopher developmentandassessmentofanaturallanguageprocessingmodeltoidentifyresidentialinstabilityinelectronichealthrecordsunstructureddataacomparisonof3integratedhealthcaredeliverysystems
AT abunassermahmoud developmentandassessmentofanaturallanguageprocessingmodeltoidentifyresidentialinstabilityinelectronichealthrecordsunstructureddataacomparisonof3integratedhealthcaredeliverysystems
AT padillaariadna developmentandassessmentofanaturallanguageprocessingmodeltoidentifyresidentialinstabilityinelectronichealthrecordsunstructureddataacomparisonof3integratedhealthcaredeliverysystems
AT lyonslindsayjoe developmentandassessmentofanaturallanguageprocessingmodeltoidentifyresidentialinstabilityinelectronichealthrecordsunstructureddataacomparisonof3integratedhealthcaredeliverysystems
AT kharrazihadi developmentandassessmentofanaturallanguageprocessingmodeltoidentifyresidentialinstabilityinelectronichealthrecordsunstructureddataacomparisonof3integratedhealthcaredeliverysystems
AT weinerjonathanp developmentandassessmentofanaturallanguageprocessingmodeltoidentifyresidentialinstabilityinelectronichealthrecordsunstructureddataacomparisonof3integratedhealthcaredeliverysystems
AT roblindouglas developmentandassessmentofanaturallanguageprocessingmodeltoidentifyresidentialinstabilityinelectronichealthrecordsunstructureddataacomparisonof3integratedhealthcaredeliverysystems