Cargando…
De-identification of primary care electronic medical records free-text data in Ontario, Canada
BACKGROUND: Electronic medical records (EMRs) represent a potentially rich source of health information for research but the free-text in EMRs often contains identifying information. While de-identification tools have been developed for free-text, none have been developed or tested for the full rang...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2907300/ https://www.ncbi.nlm.nih.gov/pubmed/20565894 http://dx.doi.org/10.1186/1472-6947-10-35 |
_version_ | 1782184093007478784 |
---|---|
author | Tu, Karen Klein-Geltink, Julie Mitiku, Tezeta F Mihai, Chiriac Martin, Joel |
author_facet | Tu, Karen Klein-Geltink, Julie Mitiku, Tezeta F Mihai, Chiriac Martin, Joel |
author_sort | Tu, Karen |
collection | PubMed |
description | BACKGROUND: Electronic medical records (EMRs) represent a potentially rich source of health information for research but the free-text in EMRs often contains identifying information. While de-identification tools have been developed for free-text, none have been developed or tested for the full range of primary care EMR data METHODS: We used deid open source de-identification software and modified it for an Ontario context for use on primary care EMR data. We developed the modified program on a training set of 1000 free-text records from one group practice and then tested it on two validation sets from a random sample of 700 free-text EMR records from 17 different physicians from 7 different practices in 5 different cities and 500 free-text records from a group practice that was in a different city than the group practice that was used for the training set. We measured the sensitivity/recall, precision, specificity, accuracy and F-measure of the modified tool against manually tagged free-text records to remove patient and physician names, locations, addresses, medical record, health card and telephone numbers. RESULTS: We found that the modified training program performed with a sensitivity of 88.3%, specificity of 91.4%, precision of 91.3%, accuracy of 89.9% and F-measure of 0.90. The validations sets had sensitivities of 86.7% and 80.2%, specificities of 91.4% and 87.7%, precisions of 91.1% and 87.4%, accuracies of 89.0% and 83.8% and F-measures of 0.89 and 0.84 for the first and second validation sets respectively. CONCLUSION: The deid program can be modified to reasonably accurately de-identify free-text primary care EMR records while preserving clinical content. |
format | Text |
id | pubmed-2907300 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-29073002010-07-21 De-identification of primary care electronic medical records free-text data in Ontario, Canada Tu, Karen Klein-Geltink, Julie Mitiku, Tezeta F Mihai, Chiriac Martin, Joel BMC Med Inform Decis Mak Research Article BACKGROUND: Electronic medical records (EMRs) represent a potentially rich source of health information for research but the free-text in EMRs often contains identifying information. While de-identification tools have been developed for free-text, none have been developed or tested for the full range of primary care EMR data METHODS: We used deid open source de-identification software and modified it for an Ontario context for use on primary care EMR data. We developed the modified program on a training set of 1000 free-text records from one group practice and then tested it on two validation sets from a random sample of 700 free-text EMR records from 17 different physicians from 7 different practices in 5 different cities and 500 free-text records from a group practice that was in a different city than the group practice that was used for the training set. We measured the sensitivity/recall, precision, specificity, accuracy and F-measure of the modified tool against manually tagged free-text records to remove patient and physician names, locations, addresses, medical record, health card and telephone numbers. RESULTS: We found that the modified training program performed with a sensitivity of 88.3%, specificity of 91.4%, precision of 91.3%, accuracy of 89.9% and F-measure of 0.90. The validations sets had sensitivities of 86.7% and 80.2%, specificities of 91.4% and 87.7%, precisions of 91.1% and 87.4%, accuracies of 89.0% and 83.8% and F-measures of 0.89 and 0.84 for the first and second validation sets respectively. CONCLUSION: The deid program can be modified to reasonably accurately de-identify free-text primary care EMR records while preserving clinical content. BioMed Central 2010-06-18 /pmc/articles/PMC2907300/ /pubmed/20565894 http://dx.doi.org/10.1186/1472-6947-10-35 Text en Copyright ©2010 Tu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Tu, Karen Klein-Geltink, Julie Mitiku, Tezeta F Mihai, Chiriac Martin, Joel De-identification of primary care electronic medical records free-text data in Ontario, Canada |
title | De-identification of primary care electronic medical records free-text data in Ontario, Canada |
title_full | De-identification of primary care electronic medical records free-text data in Ontario, Canada |
title_fullStr | De-identification of primary care electronic medical records free-text data in Ontario, Canada |
title_full_unstemmed | De-identification of primary care electronic medical records free-text data in Ontario, Canada |
title_short | De-identification of primary care electronic medical records free-text data in Ontario, Canada |
title_sort | de-identification of primary care electronic medical records free-text data in ontario, canada |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2907300/ https://www.ncbi.nlm.nih.gov/pubmed/20565894 http://dx.doi.org/10.1186/1472-6947-10-35 |
work_keys_str_mv | AT tukaren deidentificationofprimarycareelectronicmedicalrecordsfreetextdatainontariocanada AT kleingeltinkjulie deidentificationofprimarycareelectronicmedicalrecordsfreetextdatainontariocanada AT mitikutezetaf deidentificationofprimarycareelectronicmedicalrecordsfreetextdatainontariocanada AT mihaichiriac deidentificationofprimarycareelectronicmedicalrecordsfreetextdatainontariocanada AT martinjoel deidentificationofprimarycareelectronicmedicalrecordsfreetextdatainontariocanada |