Cargando…

Unlocking echocardiogram measurements for heart disease research through natural language processing

BACKGROUND: In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. IMPLEMENTATION: A natural language processing system using a dictionary lookup, rules,...

Descripción completa

Detalles Bibliográficos
Autores principales: Patterson, Olga V., Freiberg, Matthew S., Skanderson, Melissa, J. Fodeh, Samah, Brandt, Cynthia A., DuVall, Scott L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5469017/
https://www.ncbi.nlm.nih.gov/pubmed/28606104
http://dx.doi.org/10.1186/s12872-017-0580-8
_version_ 1783243503753494528
author Patterson, Olga V.
Freiberg, Matthew S.
Skanderson, Melissa
J. Fodeh, Samah
Brandt, Cynthia A.
DuVall, Scott L.
author_facet Patterson, Olga V.
Freiberg, Matthew S.
Skanderson, Melissa
J. Fodeh, Samah
Brandt, Cynthia A.
DuVall, Scott L.
author_sort Patterson, Olga V.
collection PubMed
description BACKGROUND: In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. IMPLEMENTATION: A natural language processing system using a dictionary lookup, rules, and patterns was developed to extract heart function measurements that are typically recorded in echocardiogram reports as measurement-value pairs. Curated semantic bootstrapping was used to create a custom dictionary that extends existing terminologies based on terms that actually appear in the medical record. A novel disambiguation method based on semantic constraints was created to identify and discard erroneous alternative definitions of the measurement terms. The system was built utilizing a scalable framework, making it available for processing large datasets. RESULTS: The system was developed for and validated on notes from three sources: general clinic notes, echocardiogram reports, and radiology reports. The system achieved F-scores of 0.872, 0.844, and 0.877 with precision of 0.936, 0.982, and 0.969 for each dataset respectively averaged across all extracted values. Left ventricular ejection fraction (LVEF) is the most frequently extracted measurement. The precision of extraction of the LVEF measure ranged from 0.968 to 1.0 across different document types. CONCLUSIONS: This system illustrates the feasibility and effectiveness of a large-scale information extraction on clinical data. New clinical questions can be addressed in the domain of heart failure using retrospective clinical data analysis because key heart function measurements can be successfully extracted using natural language processing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12872-017-0580-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5469017
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54690172017-06-14 Unlocking echocardiogram measurements for heart disease research through natural language processing Patterson, Olga V. Freiberg, Matthew S. Skanderson, Melissa J. Fodeh, Samah Brandt, Cynthia A. DuVall, Scott L. BMC Cardiovasc Disord Software BACKGROUND: In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. IMPLEMENTATION: A natural language processing system using a dictionary lookup, rules, and patterns was developed to extract heart function measurements that are typically recorded in echocardiogram reports as measurement-value pairs. Curated semantic bootstrapping was used to create a custom dictionary that extends existing terminologies based on terms that actually appear in the medical record. A novel disambiguation method based on semantic constraints was created to identify and discard erroneous alternative definitions of the measurement terms. The system was built utilizing a scalable framework, making it available for processing large datasets. RESULTS: The system was developed for and validated on notes from three sources: general clinic notes, echocardiogram reports, and radiology reports. The system achieved F-scores of 0.872, 0.844, and 0.877 with precision of 0.936, 0.982, and 0.969 for each dataset respectively averaged across all extracted values. Left ventricular ejection fraction (LVEF) is the most frequently extracted measurement. The precision of extraction of the LVEF measure ranged from 0.968 to 1.0 across different document types. CONCLUSIONS: This system illustrates the feasibility and effectiveness of a large-scale information extraction on clinical data. New clinical questions can be addressed in the domain of heart failure using retrospective clinical data analysis because key heart function measurements can be successfully extracted using natural language processing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12872-017-0580-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-12 /pmc/articles/PMC5469017/ /pubmed/28606104 http://dx.doi.org/10.1186/s12872-017-0580-8 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Patterson, Olga V.
Freiberg, Matthew S.
Skanderson, Melissa
J. Fodeh, Samah
Brandt, Cynthia A.
DuVall, Scott L.
Unlocking echocardiogram measurements for heart disease research through natural language processing
title Unlocking echocardiogram measurements for heart disease research through natural language processing
title_full Unlocking echocardiogram measurements for heart disease research through natural language processing
title_fullStr Unlocking echocardiogram measurements for heart disease research through natural language processing
title_full_unstemmed Unlocking echocardiogram measurements for heart disease research through natural language processing
title_short Unlocking echocardiogram measurements for heart disease research through natural language processing
title_sort unlocking echocardiogram measurements for heart disease research through natural language processing
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5469017/
https://www.ncbi.nlm.nih.gov/pubmed/28606104
http://dx.doi.org/10.1186/s12872-017-0580-8
work_keys_str_mv AT pattersonolgav unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing
AT freibergmatthews unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing
AT skandersonmelissa unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing
AT jfodehsamah unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing
AT brandtcynthiaa unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing
AT duvallscottl unlockingechocardiogrammeasurementsforheartdiseaseresearchthroughnaturallanguageprocessing