Cargando…

EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research

BACKGROUND: Electronic medical records (EMR) contain numerical data important for clinical outcomes research, such as vital signs and cardiac ejection fractions (EF), which tend to be embedded in narrative clinical notes. In current practice, this data is often manually extracted for use in research...

Descripción completa

Detalles Bibliográficos
Autores principales: Cai, Tianrun, Zhang, Luwan, Yang, Nicole, Kumamaru, Kanako K., Rybicki, Frank J., Cai, Tianxi, Liao, Katherine P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6858776/
https://www.ncbi.nlm.nih.gov/pubmed/31730484
http://dx.doi.org/10.1186/s12911-019-0970-1
_version_ 1783471027182895104
author Cai, Tianrun
Zhang, Luwan
Yang, Nicole
Kumamaru, Kanako K.
Rybicki, Frank J.
Cai, Tianxi
Liao, Katherine P.
author_facet Cai, Tianrun
Zhang, Luwan
Yang, Nicole
Kumamaru, Kanako K.
Rybicki, Frank J.
Cai, Tianxi
Liao, Katherine P.
author_sort Cai, Tianrun
collection PubMed
description BACKGROUND: Electronic medical records (EMR) contain numerical data important for clinical outcomes research, such as vital signs and cardiac ejection fractions (EF), which tend to be embedded in narrative clinical notes. In current practice, this data is often manually extracted for use in research studies. However, due to the large volume of notes in datasets, manually extracting numerical data often becomes infeasible. The objective of this study is to develop and validate a natural language processing (NLP) tool that can efficiently extract numerical clinical data from narrative notes. RESULTS: To validate the accuracy of the tool EXTraction of EMR Numerical Data (EXTEND), we developed a reference standard by manually extracting vital signs from 285 notes, EF values from 300 notes, glycated hemoglobin (HbA1C), and serum creatinine from 890 notes. For each parameter of interest, we calculated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F(1) score of EXTEND using two metrics. (1) completion of data extraction, and (2) accuracy of data extraction compared to the actual values in the note verified by chart review. At the note level, extraction by EXTEND was considered correct only if it accurately detected and extracted all values of interest in a note. Using manually-annotated labels as the gold standard, the note-level accuracy of EXTEND in capturing the numerical vital sign values, EF, HbA1C and creatinine ranged from 0.88 to 0.95 for sensitivity, 0.95 to 1.0 for specificity, 0.95 to 1.0 for PPV, 0.89 to 0.99 for NPV, and 0.92 to 0.96 in F(1) scores. Compared to the actual value level, the sensitivity, PPV, and F(1) score of EXTEND ranged from 0.91 to 0.95, 0.95 to 1.0 and 0.95 to 0.96. CONCLUSIONS: EXTEND is an efficient, flexible tool that uses knowledge-based rules to extract clinical numerical parameters with high accuracy. By increasing dictionary terms and developing new rules, the usage of EXTEND can easily be expanded to extract additional numerical data important in clinical outcomes research.
format Online
Article
Text
id pubmed-6858776
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68587762019-11-29 EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research Cai, Tianrun Zhang, Luwan Yang, Nicole Kumamaru, Kanako K. Rybicki, Frank J. Cai, Tianxi Liao, Katherine P. BMC Med Inform Decis Mak Software BACKGROUND: Electronic medical records (EMR) contain numerical data important for clinical outcomes research, such as vital signs and cardiac ejection fractions (EF), which tend to be embedded in narrative clinical notes. In current practice, this data is often manually extracted for use in research studies. However, due to the large volume of notes in datasets, manually extracting numerical data often becomes infeasible. The objective of this study is to develop and validate a natural language processing (NLP) tool that can efficiently extract numerical clinical data from narrative notes. RESULTS: To validate the accuracy of the tool EXTraction of EMR Numerical Data (EXTEND), we developed a reference standard by manually extracting vital signs from 285 notes, EF values from 300 notes, glycated hemoglobin (HbA1C), and serum creatinine from 890 notes. For each parameter of interest, we calculated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F(1) score of EXTEND using two metrics. (1) completion of data extraction, and (2) accuracy of data extraction compared to the actual values in the note verified by chart review. At the note level, extraction by EXTEND was considered correct only if it accurately detected and extracted all values of interest in a note. Using manually-annotated labels as the gold standard, the note-level accuracy of EXTEND in capturing the numerical vital sign values, EF, HbA1C and creatinine ranged from 0.88 to 0.95 for sensitivity, 0.95 to 1.0 for specificity, 0.95 to 1.0 for PPV, 0.89 to 0.99 for NPV, and 0.92 to 0.96 in F(1) scores. Compared to the actual value level, the sensitivity, PPV, and F(1) score of EXTEND ranged from 0.91 to 0.95, 0.95 to 1.0 and 0.95 to 0.96. CONCLUSIONS: EXTEND is an efficient, flexible tool that uses knowledge-based rules to extract clinical numerical parameters with high accuracy. By increasing dictionary terms and developing new rules, the usage of EXTEND can easily be expanded to extract additional numerical data important in clinical outcomes research. BioMed Central 2019-11-15 /pmc/articles/PMC6858776/ /pubmed/31730484 http://dx.doi.org/10.1186/s12911-019-0970-1 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Cai, Tianrun
Zhang, Luwan
Yang, Nicole
Kumamaru, Kanako K.
Rybicki, Frank J.
Cai, Tianxi
Liao, Katherine P.
EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research
title EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research
title_full EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research
title_fullStr EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research
title_full_unstemmed EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research
title_short EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research
title_sort extraction of emr numerical data: an efficient and generalizable tool to extend clinical research
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6858776/
https://www.ncbi.nlm.nih.gov/pubmed/31730484
http://dx.doi.org/10.1186/s12911-019-0970-1
work_keys_str_mv AT caitianrun extractionofemrnumericaldataanefficientandgeneralizabletooltoextendclinicalresearch
AT zhangluwan extractionofemrnumericaldataanefficientandgeneralizabletooltoextendclinicalresearch
AT yangnicole extractionofemrnumericaldataanefficientandgeneralizabletooltoextendclinicalresearch
AT kumamarukanakok extractionofemrnumericaldataanefficientandgeneralizabletooltoextendclinicalresearch
AT rybickifrankj extractionofemrnumericaldataanefficientandgeneralizabletooltoextendclinicalresearch
AT caitianxi extractionofemrnumericaldataanefficientandgeneralizabletooltoextendclinicalresearch
AT liaokatherinep extractionofemrnumericaldataanefficientandgeneralizabletooltoextendclinicalresearch