Cargando…

Natural language processing-driven state machines to extract social factors from unstructured clinical documentation

OBJECTIVE: This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Allen, Katie S, Hood, Dan R, Cummins, Jonathan, Kasturi, Suranga, Mendonca, Eneida A, Vest, Joshua R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112959/
https://www.ncbi.nlm.nih.gov/pubmed/37081945
http://dx.doi.org/10.1093/jamiaopen/ooad024
_version_ 1785027725396803584
author Allen, Katie S
Hood, Dan R
Cummins, Jonathan
Kasturi, Suranga
Mendonca, Eneida A
Vest, Joshua R
author_facet Allen, Katie S
Hood, Dan R
Cummins, Jonathan
Kasturi, Suranga
Mendonca, Eneida A
Vest, Joshua R
author_sort Allen, Katie S
collection PubMed
description OBJECTIVE: This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for generalizability. MATERIALS AND METHODS: Notes from 2 healthcare systems, representing a variety of note types, were utilized. To train models, the study utilized n-grams to identify keywords and implemented natural language processing (NLP) state machines across all note types. Manual review was conducted to determine performance. Sampling was based on a set percentage of notes, based on the prevalence of social need. Models were optimized over multiple training and evaluation cycles. Performance metrics were calculated using positive predictive value (PPV), negative predictive value, sensitivity, and specificity. RESULTS: PPV for housing rose from 0.71 to 0.95 over 3 training runs. PPV for financial rose from 0.83 to 0.89 over 2 training iterations, while PPV for unemployment rose from 0.78 to 0.88 over 3 iterations. The test data resulted in PPVs of 0.94, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Final specificity scores were 0.95, 0.97, and 0.95 for housing, financial, and unemployment, respectively. DISCUSSION: We developed 3 rule-based NLP algorithms, trained across health systems. While this is a less sophisticated approach, the algorithms demonstrated a high degree of generalizability, maintaining >0.85 across all predictive performance metrics. CONCLUSION: The rule-based NLP algorithms demonstrated consistent performance in identifying 3 social factors within clinical text. These methods may be a part of a strategy to measure social factors within an institution.
format Online
Article
Text
id pubmed-10112959
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101129592023-04-19 Natural language processing-driven state machines to extract social factors from unstructured clinical documentation Allen, Katie S Hood, Dan R Cummins, Jonathan Kasturi, Suranga Mendonca, Eneida A Vest, Joshua R JAMIA Open Research and Applications OBJECTIVE: This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for generalizability. MATERIALS AND METHODS: Notes from 2 healthcare systems, representing a variety of note types, were utilized. To train models, the study utilized n-grams to identify keywords and implemented natural language processing (NLP) state machines across all note types. Manual review was conducted to determine performance. Sampling was based on a set percentage of notes, based on the prevalence of social need. Models were optimized over multiple training and evaluation cycles. Performance metrics were calculated using positive predictive value (PPV), negative predictive value, sensitivity, and specificity. RESULTS: PPV for housing rose from 0.71 to 0.95 over 3 training runs. PPV for financial rose from 0.83 to 0.89 over 2 training iterations, while PPV for unemployment rose from 0.78 to 0.88 over 3 iterations. The test data resulted in PPVs of 0.94, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Final specificity scores were 0.95, 0.97, and 0.95 for housing, financial, and unemployment, respectively. DISCUSSION: We developed 3 rule-based NLP algorithms, trained across health systems. While this is a less sophisticated approach, the algorithms demonstrated a high degree of generalizability, maintaining >0.85 across all predictive performance metrics. CONCLUSION: The rule-based NLP algorithms demonstrated consistent performance in identifying 3 social factors within clinical text. These methods may be a part of a strategy to measure social factors within an institution. Oxford University Press 2023-04-18 /pmc/articles/PMC10112959/ /pubmed/37081945 http://dx.doi.org/10.1093/jamiaopen/ooad024 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Allen, Katie S
Hood, Dan R
Cummins, Jonathan
Kasturi, Suranga
Mendonca, Eneida A
Vest, Joshua R
Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title_full Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title_fullStr Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title_full_unstemmed Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title_short Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title_sort natural language processing-driven state machines to extract social factors from unstructured clinical documentation
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112959/
https://www.ncbi.nlm.nih.gov/pubmed/37081945
http://dx.doi.org/10.1093/jamiaopen/ooad024
work_keys_str_mv AT allenkaties naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation
AT hooddanr naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation
AT cumminsjonathan naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation
AT kasturisuranga naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation
AT mendoncaeneidaa naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation
AT vestjoshuar naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation