Cargando…
Automated data extraction—A feasible way to construct patient registers of primary care utilization
INTRODUCTION. Electronic medical records (EMRs) enable analysis of health care data by using data mining techniques to build research databases. Though the reliability of the data extraction process is crucial for the credibility of the final analysis, there are few published validations of this pro...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Informa Healthcare
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3282243/ https://www.ncbi.nlm.nih.gov/pubmed/22335391 http://dx.doi.org/10.3109/03009734.2011.653015 |
_version_ | 1782224054521954304 |
---|---|
author | Martinell, Mats Stålhammar, Jan Hallqvist, Johan |
author_facet | Martinell, Mats Stålhammar, Jan Hallqvist, Johan |
author_sort | Martinell, Mats |
collection | PubMed |
description | INTRODUCTION. Electronic medical records (EMRs) enable analysis of health care data by using data mining techniques to build research databases. Though the reliability of the data extraction process is crucial for the credibility of the final analysis, there are few published validations of this process. In this paper we validate the performance of an automated data mining tool on EMR in a primary care setting. METHODS. The Pygargus Customized eXtraction Program (CXP) was programmed to find and then extract data from patients meeting criteria for type 2 diabetes mellitus (T2DM) at one primary health care clinic (PHC). The ability of CXP to extract relevant cases was assessed by comparing cases extracted by an EMR integrated search engine. The concordance of extracted data with the original EMR source was manually controlled. RESULTS. Prevalence of T2DM was 4.0%, which correspond well to previous estimations. By searching for drug prescriptions, diagnosis codes, and laboratory values, 38%, 53%, and 91% of relevant cases were found, respectively. The sensitivity of CXP regarding extraction of relevant cases was 100%. The specificity was 99.9% due to 12 non-T2DM cases extracted. The congruity at single-item level was 99.6%. The 13 incorrect data items were all located in the same structural module. CONCLUSION. The CXP is a reliable and accurate data mining tool to extract selective data from EMR. |
format | Online Article Text |
id | pubmed-3282243 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Informa Healthcare |
record_format | MEDLINE/PubMed |
spelling | pubmed-32822432012-03-01 Automated data extraction—A feasible way to construct patient registers of primary care utilization Martinell, Mats Stålhammar, Jan Hallqvist, Johan Ups J Med Sci Original Articles INTRODUCTION. Electronic medical records (EMRs) enable analysis of health care data by using data mining techniques to build research databases. Though the reliability of the data extraction process is crucial for the credibility of the final analysis, there are few published validations of this process. In this paper we validate the performance of an automated data mining tool on EMR in a primary care setting. METHODS. The Pygargus Customized eXtraction Program (CXP) was programmed to find and then extract data from patients meeting criteria for type 2 diabetes mellitus (T2DM) at one primary health care clinic (PHC). The ability of CXP to extract relevant cases was assessed by comparing cases extracted by an EMR integrated search engine. The concordance of extracted data with the original EMR source was manually controlled. RESULTS. Prevalence of T2DM was 4.0%, which correspond well to previous estimations. By searching for drug prescriptions, diagnosis codes, and laboratory values, 38%, 53%, and 91% of relevant cases were found, respectively. The sensitivity of CXP regarding extraction of relevant cases was 100%. The specificity was 99.9% due to 12 non-T2DM cases extracted. The congruity at single-item level was 99.6%. The 13 incorrect data items were all located in the same structural module. CONCLUSION. The CXP is a reliable and accurate data mining tool to extract selective data from EMR. Informa Healthcare 2012-03 2012-02-15 /pmc/articles/PMC3282243/ /pubmed/22335391 http://dx.doi.org/10.3109/03009734.2011.653015 Text en © Informa Healthcare http://creativecommons.org/licenses/by/2.5/ This is an open-access article distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the source is credited. |
spellingShingle | Original Articles Martinell, Mats Stålhammar, Jan Hallqvist, Johan Automated data extraction—A feasible way to construct patient registers of primary care utilization |
title | Automated data extraction—A feasible way to construct patient registers of primary care utilization |
title_full | Automated data extraction—A feasible way to construct patient registers of primary care utilization |
title_fullStr | Automated data extraction—A feasible way to construct patient registers of primary care utilization |
title_full_unstemmed | Automated data extraction—A feasible way to construct patient registers of primary care utilization |
title_short | Automated data extraction—A feasible way to construct patient registers of primary care utilization |
title_sort | automated data extraction—a feasible way to construct patient registers of primary care utilization |
topic | Original Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3282243/ https://www.ncbi.nlm.nih.gov/pubmed/22335391 http://dx.doi.org/10.3109/03009734.2011.653015 |
work_keys_str_mv | AT martinellmats automateddataextractionafeasiblewaytoconstructpatientregistersofprimarycareutilization AT stalhammarjan automateddataextractionafeasiblewaytoconstructpatientregistersofprimarycareutilization AT hallqvistjohan automateddataextractionafeasiblewaytoconstructpatientregistersofprimarycareutilization |