Cargando…
Natural language processing-driven state machines to extract social factors from unstructured clinical documentation
OBJECTIVE: This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for gen...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112959/ https://www.ncbi.nlm.nih.gov/pubmed/37081945 http://dx.doi.org/10.1093/jamiaopen/ooad024 |
_version_ | 1785027725396803584 |
---|---|
author | Allen, Katie S Hood, Dan R Cummins, Jonathan Kasturi, Suranga Mendonca, Eneida A Vest, Joshua R |
author_facet | Allen, Katie S Hood, Dan R Cummins, Jonathan Kasturi, Suranga Mendonca, Eneida A Vest, Joshua R |
author_sort | Allen, Katie S |
collection | PubMed |
description | OBJECTIVE: This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for generalizability. MATERIALS AND METHODS: Notes from 2 healthcare systems, representing a variety of note types, were utilized. To train models, the study utilized n-grams to identify keywords and implemented natural language processing (NLP) state machines across all note types. Manual review was conducted to determine performance. Sampling was based on a set percentage of notes, based on the prevalence of social need. Models were optimized over multiple training and evaluation cycles. Performance metrics were calculated using positive predictive value (PPV), negative predictive value, sensitivity, and specificity. RESULTS: PPV for housing rose from 0.71 to 0.95 over 3 training runs. PPV for financial rose from 0.83 to 0.89 over 2 training iterations, while PPV for unemployment rose from 0.78 to 0.88 over 3 iterations. The test data resulted in PPVs of 0.94, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Final specificity scores were 0.95, 0.97, and 0.95 for housing, financial, and unemployment, respectively. DISCUSSION: We developed 3 rule-based NLP algorithms, trained across health systems. While this is a less sophisticated approach, the algorithms demonstrated a high degree of generalizability, maintaining >0.85 across all predictive performance metrics. CONCLUSION: The rule-based NLP algorithms demonstrated consistent performance in identifying 3 social factors within clinical text. These methods may be a part of a strategy to measure social factors within an institution. |
format | Online Article Text |
id | pubmed-10112959 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-101129592023-04-19 Natural language processing-driven state machines to extract social factors from unstructured clinical documentation Allen, Katie S Hood, Dan R Cummins, Jonathan Kasturi, Suranga Mendonca, Eneida A Vest, Joshua R JAMIA Open Research and Applications OBJECTIVE: This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for generalizability. MATERIALS AND METHODS: Notes from 2 healthcare systems, representing a variety of note types, were utilized. To train models, the study utilized n-grams to identify keywords and implemented natural language processing (NLP) state machines across all note types. Manual review was conducted to determine performance. Sampling was based on a set percentage of notes, based on the prevalence of social need. Models were optimized over multiple training and evaluation cycles. Performance metrics were calculated using positive predictive value (PPV), negative predictive value, sensitivity, and specificity. RESULTS: PPV for housing rose from 0.71 to 0.95 over 3 training runs. PPV for financial rose from 0.83 to 0.89 over 2 training iterations, while PPV for unemployment rose from 0.78 to 0.88 over 3 iterations. The test data resulted in PPVs of 0.94, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Final specificity scores were 0.95, 0.97, and 0.95 for housing, financial, and unemployment, respectively. DISCUSSION: We developed 3 rule-based NLP algorithms, trained across health systems. While this is a less sophisticated approach, the algorithms demonstrated a high degree of generalizability, maintaining >0.85 across all predictive performance metrics. CONCLUSION: The rule-based NLP algorithms demonstrated consistent performance in identifying 3 social factors within clinical text. These methods may be a part of a strategy to measure social factors within an institution. Oxford University Press 2023-04-18 /pmc/articles/PMC10112959/ /pubmed/37081945 http://dx.doi.org/10.1093/jamiaopen/ooad024 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research and Applications Allen, Katie S Hood, Dan R Cummins, Jonathan Kasturi, Suranga Mendonca, Eneida A Vest, Joshua R Natural language processing-driven state machines to extract social factors from unstructured clinical documentation |
title | Natural language processing-driven state machines to extract social factors from unstructured clinical documentation |
title_full | Natural language processing-driven state machines to extract social factors from unstructured clinical documentation |
title_fullStr | Natural language processing-driven state machines to extract social factors from unstructured clinical documentation |
title_full_unstemmed | Natural language processing-driven state machines to extract social factors from unstructured clinical documentation |
title_short | Natural language processing-driven state machines to extract social factors from unstructured clinical documentation |
title_sort | natural language processing-driven state machines to extract social factors from unstructured clinical documentation |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112959/ https://www.ncbi.nlm.nih.gov/pubmed/37081945 http://dx.doi.org/10.1093/jamiaopen/ooad024 |
work_keys_str_mv | AT allenkaties naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation AT hooddanr naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation AT cumminsjonathan naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation AT kasturisuranga naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation AT mendoncaeneidaa naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation AT vestjoshuar naturallanguageprocessingdrivenstatemachinestoextractsocialfactorsfromunstructuredclinicaldocumentation |