Cargando…

Text data extraction for a prospective, research-focused data mart: implementation and validation

BACKGROUND: Translational research typically requires data abstracted from medical records as well as data collected specifically for research. Unfortunately, many data within electronic health records are represented as text that is not amenable to aggregation for analyses. We present a scalable op...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hinchcliff, Monique, Just, Eric, Podlusky, Sofia, Varga, John, Chang, Rowland W, Kibbe, Warren A
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Correspondence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3537747/ https://www.ncbi.nlm.nih.gov/pubmed/22970696 http://dx.doi.org/10.1186/1472-6947-12-106

_version_	1782254909371973632
author	Hinchcliff, Monique Just, Eric Podlusky, Sofia Varga, John Chang, Rowland W Kibbe, Warren A
author_facet	Hinchcliff, Monique Just, Eric Podlusky, Sofia Varga, John Chang, Rowland W Kibbe, Warren A
author_sort	Hinchcliff, Monique
collection	PubMed
description	BACKGROUND: Translational research typically requires data abstracted from medical records as well as data collected specifically for research. Unfortunately, many data within electronic health records are represented as text that is not amenable to aggregation for analyses. We present a scalable open source SQL Server Integration Services package, called Regextractor, for including regular expression parsers into a classic extract, transform, and load workflow. We have used Regextractor to abstract discrete data from textual reports from a number of ‘machine generated’ sources. To validate this package, we created a pulmonary function test data mart and analyzed the quality of the data mart versus manual chart review. METHODS: Eleven variables from pulmonary function tests performed closest to the initial clinical evaluation date were studied for 100 randomly selected subjects with scleroderma. One research assistant manually reviewed, abstracted, and entered relevant data into a database. Correlation with data obtained from the automated pulmonary function test data mart within the Northwestern Medical Enterprise Data Warehouse was determined. RESULTS: There was a near perfect (99.5%) agreement between results generated from the Regextractor package and those obtained via manual chart abstraction. The pulmonary function test data mart has been used subsequently to monitor disease progression of patients in the Northwestern Scleroderma Registry. In addition to the pulmonary function test example presented in this manuscript, the Regextractor package has been used to create cardiac catheterization and echocardiography data marts. The Regextractor package was released as open source software in October 2009 and has been downloaded 552 times as of 6/1/2012. CONCLUSIONS: Collaboration between clinical researchers and biomedical informatics experts enabled the development and validation of a tool (Regextractor) to parse, abstract and assemble structured data from text data contained in the electronic health record. Regextractor has been successfully used to create additional data marts in other medical domains and is available to the public.
format	Online Article Text
id	pubmed-3537747
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35377472013-01-10 Text data extraction for a prospective, research-focused data mart: implementation and validation Hinchcliff, Monique Just, Eric Podlusky, Sofia Varga, John Chang, Rowland W Kibbe, Warren A BMC Med Inform Decis Mak Correspondence BACKGROUND: Translational research typically requires data abstracted from medical records as well as data collected specifically for research. Unfortunately, many data within electronic health records are represented as text that is not amenable to aggregation for analyses. We present a scalable open source SQL Server Integration Services package, called Regextractor, for including regular expression parsers into a classic extract, transform, and load workflow. We have used Regextractor to abstract discrete data from textual reports from a number of ‘machine generated’ sources. To validate this package, we created a pulmonary function test data mart and analyzed the quality of the data mart versus manual chart review. METHODS: Eleven variables from pulmonary function tests performed closest to the initial clinical evaluation date were studied for 100 randomly selected subjects with scleroderma. One research assistant manually reviewed, abstracted, and entered relevant data into a database. Correlation with data obtained from the automated pulmonary function test data mart within the Northwestern Medical Enterprise Data Warehouse was determined. RESULTS: There was a near perfect (99.5%) agreement between results generated from the Regextractor package and those obtained via manual chart abstraction. The pulmonary function test data mart has been used subsequently to monitor disease progression of patients in the Northwestern Scleroderma Registry. In addition to the pulmonary function test example presented in this manuscript, the Regextractor package has been used to create cardiac catheterization and echocardiography data marts. The Regextractor package was released as open source software in October 2009 and has been downloaded 552 times as of 6/1/2012. CONCLUSIONS: Collaboration between clinical researchers and biomedical informatics experts enabled the development and validation of a tool (Regextractor) to parse, abstract and assemble structured data from text data contained in the electronic health record. Regextractor has been successfully used to create additional data marts in other medical domains and is available to the public. BioMed Central 2012-09-13 /pmc/articles/PMC3537747/ /pubmed/22970696 http://dx.doi.org/10.1186/1472-6947-12-106 Text en Copyright ©2012 Hinchcliff et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Correspondence Hinchcliff, Monique Just, Eric Podlusky, Sofia Varga, John Chang, Rowland W Kibbe, Warren A Text data extraction for a prospective, research-focused data mart: implementation and validation
title	Text data extraction for a prospective, research-focused data mart: implementation and validation
title_full	Text data extraction for a prospective, research-focused data mart: implementation and validation
title_fullStr	Text data extraction for a prospective, research-focused data mart: implementation and validation
title_full_unstemmed	Text data extraction for a prospective, research-focused data mart: implementation and validation
title_short	Text data extraction for a prospective, research-focused data mart: implementation and validation
title_sort	text data extraction for a prospective, research-focused data mart: implementation and validation
topic	Correspondence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3537747/ https://www.ncbi.nlm.nih.gov/pubmed/22970696 http://dx.doi.org/10.1186/1472-6947-12-106
work_keys_str_mv	AT hinchcliffmonique textdataextractionforaprospectiveresearchfocuseddatamartimplementationandvalidation AT justeric textdataextractionforaprospectiveresearchfocuseddatamartimplementationandvalidation AT podluskysofia textdataextractionforaprospectiveresearchfocuseddatamartimplementationandvalidation AT vargajohn textdataextractionforaprospectiveresearchfocuseddatamartimplementationandvalidation AT changrowlandw textdataextractionforaprospectiveresearchfocuseddatamartimplementationandvalidation AT kibbewarrena textdataextractionforaprospectiveresearchfocuseddatamartimplementationandvalidation

Text data extraction for a prospective, research-focused data mart: implementation and validation

Ejemplares similares