Cargando…

Enhancing an enterprise data warehouse for research with data extracted using natural language processing

OBJECTIVE: This study aims to develop a generalizable architecture for enhancing an enterprise data warehouse for research (EDW4R) with results from a natural language processing (NLP) model, which allows discrete data derived from clinical notes to be made broadly available for research use without...

Descripción completa

Detalles Bibliográficos
Autores principales: Magoc, Tanja, Everson, Russell, Harle, Christopher A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cambridge University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10346024/
https://www.ncbi.nlm.nih.gov/pubmed/37456264
http://dx.doi.org/10.1017/cts.2023.575
_version_ 1785073221694914560
author Magoc, Tanja
Everson, Russell
Harle, Christopher A.
author_facet Magoc, Tanja
Everson, Russell
Harle, Christopher A.
author_sort Magoc, Tanja
collection PubMed
description OBJECTIVE: This study aims to develop a generalizable architecture for enhancing an enterprise data warehouse for research (EDW4R) with results from a natural language processing (NLP) model, which allows discrete data derived from clinical notes to be made broadly available for research use without need for NLP expertise. The study also quantifies the additional value that information extracted from clinical narratives brings to EDW4R. MATERIALS AND METHODS: Clinical notes written during one month at an academic health center were used to evaluate the performance of an existing NLP model and to quantify its value added to the structured data. Manual review was utilized for performance analysis. The architecture for enhancing the EDW4R is described in detail to enable reproducibility. RESULTS: Two weeks were needed to enhance EDW4R with data from 250 million clinical notes. NLP generated 16 and 39% increase in data availability for two variables. DISCUSSION: Our architecture is highly generalizable to a new NLP model. The positive predictive value obtained by an independent team showed only slightly lower NLP performance than the values reported by the NLP developers. The NLP showed significant value added to data already available in structured format. CONCLUSION: Given the value added by data extracted using NLP, it is important to enhance EDW4R with these data to enable research teams without NLP expertise to benefit from value added by NLP models.
format Online
Article
Text
id pubmed-10346024
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cambridge University Press
record_format MEDLINE/PubMed
spelling pubmed-103460242023-07-15 Enhancing an enterprise data warehouse for research with data extracted using natural language processing Magoc, Tanja Everson, Russell Harle, Christopher A. J Clin Transl Sci Research Article OBJECTIVE: This study aims to develop a generalizable architecture for enhancing an enterprise data warehouse for research (EDW4R) with results from a natural language processing (NLP) model, which allows discrete data derived from clinical notes to be made broadly available for research use without need for NLP expertise. The study also quantifies the additional value that information extracted from clinical narratives brings to EDW4R. MATERIALS AND METHODS: Clinical notes written during one month at an academic health center were used to evaluate the performance of an existing NLP model and to quantify its value added to the structured data. Manual review was utilized for performance analysis. The architecture for enhancing the EDW4R is described in detail to enable reproducibility. RESULTS: Two weeks were needed to enhance EDW4R with data from 250 million clinical notes. NLP generated 16 and 39% increase in data availability for two variables. DISCUSSION: Our architecture is highly generalizable to a new NLP model. The positive predictive value obtained by an independent team showed only slightly lower NLP performance than the values reported by the NLP developers. The NLP showed significant value added to data already available in structured format. CONCLUSION: Given the value added by data extracted using NLP, it is important to enhance EDW4R with these data to enable research teams without NLP expertise to benefit from value added by NLP models. Cambridge University Press 2023-06-13 /pmc/articles/PMC10346024/ /pubmed/37456264 http://dx.doi.org/10.1017/cts.2023.575 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
spellingShingle Research Article
Magoc, Tanja
Everson, Russell
Harle, Christopher A.
Enhancing an enterprise data warehouse for research with data extracted using natural language processing
title Enhancing an enterprise data warehouse for research with data extracted using natural language processing
title_full Enhancing an enterprise data warehouse for research with data extracted using natural language processing
title_fullStr Enhancing an enterprise data warehouse for research with data extracted using natural language processing
title_full_unstemmed Enhancing an enterprise data warehouse for research with data extracted using natural language processing
title_short Enhancing an enterprise data warehouse for research with data extracted using natural language processing
title_sort enhancing an enterprise data warehouse for research with data extracted using natural language processing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10346024/
https://www.ncbi.nlm.nih.gov/pubmed/37456264
http://dx.doi.org/10.1017/cts.2023.575
work_keys_str_mv AT magoctanja enhancinganenterprisedatawarehouseforresearchwithdataextractedusingnaturallanguageprocessing
AT eversonrussell enhancinganenterprisedatawarehouseforresearchwithdataextractedusingnaturallanguageprocessing
AT harlechristophera enhancinganenterprisedatawarehouseforresearchwithdataextractedusingnaturallanguageprocessing