Cargando…

Natural language processing-driven state machines to extract social factors from unstructured clinical documentation

OBJECTIVE: This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for gen...

Descripción completa

Detalles Bibliográficos
Autores principales:	Allen, Katie S, Hood, Dan R, Cummins, Jonathan, Kasturi, Suranga, Mendonca, Eneida A, Vest, Joshua R
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112959/ https://www.ncbi.nlm.nih.gov/pubmed/37081945 http://dx.doi.org/10.1093/jamiaopen/ooad024

_version_	1785027725396803584
author	Allen, Katie S Hood, Dan R Cummins, Jonathan Kasturi, Suranga Mendonca, Eneida A Vest, Joshua R
author_facet	Allen, Katie S Hood, Dan R Cummins, Jonathan Kasturi, Suranga Mendonca, Eneida A Vest, Joshua R
author_sort	Allen, Katie S
collection	PubMed
description	OBJECTIVE: This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for generalizability. MATERIALS AND METHODS: Notes from 2 healthcare systems, representing a variety of note types, were utilized. To train models, the study utilized n-grams to identify keywords and implemented natural language processing (NLP) state machines across all note types. Manual review was conducted to determine performance. Sampling was based on a set percentage of notes, based on the prevalence of social need. Models were optimized over multiple training and evaluation cycles. Performance metrics were calculated using positive predictive value (PPV), negative predictive value, sensitivity, and specificity. RESULTS: PPV for housing rose from 0.71 to 0.95 over 3 training runs. PPV for financial rose from 0.83 to 0.89 over 2 training iterations, while PPV for unemployment rose from 0.78 to 0.88 over 3 iterations. The test data resulted in PPVs of 0.94, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Final specificity scores were 0.95, 0.97, and 0.95 for housing, financial, and unemployment, respectively. DISCUSSION: We developed 3 rule-based NLP algorithms, trained across health systems. While this is a less sophisticated approach, the algorithms demonstrated a high degree of generalizability, maintaining >0.85 across all predictive performance metrics. CONCLUSION: The rule-based NLP algorithms demonstrated consistent performance in identifying 3 social factors within clinical text. These methods may be a part of a strategy to measure social factors within an institution.
format	Online Article Text
id	pubmed-10112959
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-101129592023-04-19 Natural language processing-driven state machines to extract social factors from unstructured clinical documentation Allen, Katie S Hood, Dan R Cummins, Jonathan Kasturi, Suranga Mendonca, Eneida A Vest, Joshua R JAMIA Open Research and Applications OBJECTIVE: This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for generalizability. MATERIALS AND METHODS: Notes from 2 healthcare systems, representing a variety of note types, were utilized. To train models, the study utilized n-grams to identify keywords and implemented natural language processing (NLP) state machines across all note types. Manual review was conducted to determine performance. Sampling was based on a set percentage of notes, based on the prevalence of social need. Models were optimized over multiple training and evaluation cycles. Performance metrics were calculated using positive predictive value (PPV), negative predictive value, sensitivity, and specificity. RESULTS: PPV for housing rose from 0.71 to 0.95 over 3 training runs. PPV for financial rose from 0.83 to 0.89 over 2 training iterations, while PPV for unemployment rose from 0.78 to 0.88 over 3 iterations. The test data resulted in PPVs of 0.94, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Final specificity scores were 0.95, 0.97, and 0.95 for housing, financial, and unemployment, respectively. DISCUSSION: We developed 3 rule-based NLP algorithms, trained across health systems. While this is a less sophisticated approach, the algorithms demonstrated a high degree of generalizability, maintaining >0.85 across all predictive performance metrics. CONCLUSION: The rule-based NLP algorithms demonstrated consistent performance in identifying 3 social factors within clinical text. These methods may be a part of a strategy to measure social factors within an institution. Oxford University Press 2023-04-18 /pmc/articles/PMC10112959/ /pubmed/37081945 http://dx.doi.org/10.1093/jamiaopen/ooad024 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Allen, Katie S Hood, Dan R Cummins, Jonathan Kasturi, Suranga Mendonca, Eneida A Vest, Joshua R Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title	Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title_full	Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title_fullStr	Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title_full_unstemmed	Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title_short	Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
title_sort	natural language processing-driven state machines to extract social factors from unstructured clinical documentation
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112959/ https://www.ncbi.nlm.nih.gov/pubmed/37081945 http://dx.doi.org/10.1093/jamiaopen/ooad024
work_keys_str_mv	AT allenkaties naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation AT hooddanr naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation AT cumminsjonathan naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation AT kasturisuranga naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation AT mendoncaeneidaa naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation AT vestjoshuar naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation

Natural language processing-driven state machines to extract social factors from unstructured clinical documentation

Ejemplares similares