Cargando…

Data-driven approach for creating synthetic electronic medical records

BACKGROUND: New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great diffic...

Descripción completa

Detalles Bibliográficos
Autores principales: Buczak, Anna L, Babin, Steven, Moniz, Linda
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2972239/
https://www.ncbi.nlm.nih.gov/pubmed/20946670
http://dx.doi.org/10.1186/1472-6947-10-59
_version_ 1782190774231760896
author Buczak, Anna L
Babin, Steven
Moniz, Linda
author_facet Buczak, Anna L
Babin, Steven
Moniz, Linda
author_sort Buczak, Anna L
collection PubMed
description BACKGROUND: New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed. METHODS: This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population. RESULTS: We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified. CONCLUSIONS: A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4-11 year old age group. The adaptations that must be made to the algorithms to produce synthetic background EMRs for other age groups are indicated.
format Text
id pubmed-2972239
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29722392010-11-05 Data-driven approach for creating synthetic electronic medical records Buczak, Anna L Babin, Steven Moniz, Linda BMC Med Inform Decis Mak Research Article BACKGROUND: New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed. METHODS: This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population. RESULTS: We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified. CONCLUSIONS: A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4-11 year old age group. The adaptations that must be made to the algorithms to produce synthetic background EMRs for other age groups are indicated. BioMed Central 2010-10-14 /pmc/articles/PMC2972239/ /pubmed/20946670 http://dx.doi.org/10.1186/1472-6947-10-59 Text en Copyright ©2010 Buczak et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Buczak, Anna L
Babin, Steven
Moniz, Linda
Data-driven approach for creating synthetic electronic medical records
title Data-driven approach for creating synthetic electronic medical records
title_full Data-driven approach for creating synthetic electronic medical records
title_fullStr Data-driven approach for creating synthetic electronic medical records
title_full_unstemmed Data-driven approach for creating synthetic electronic medical records
title_short Data-driven approach for creating synthetic electronic medical records
title_sort data-driven approach for creating synthetic electronic medical records
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2972239/
https://www.ncbi.nlm.nih.gov/pubmed/20946670
http://dx.doi.org/10.1186/1472-6947-10-59
work_keys_str_mv AT buczakannal datadrivenapproachforcreatingsyntheticelectronicmedicalrecords
AT babinsteven datadrivenapproachforcreatingsyntheticelectronicmedicalrecords
AT monizlinda datadrivenapproachforcreatingsyntheticelectronicmedicalrecords