Cargando…
Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records
BACKGROUND: Electronic health records (EHRs) provide enormous potential for health research but also present data governance challenges. Ensuring de-identification is a pre-requisite for use of EHR data without prior consent. The South London and Maudsley NHS Trust (SLaM), one of the largest seconda...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751474/ https://www.ncbi.nlm.nih.gov/pubmed/23842533 http://dx.doi.org/10.1186/1472-6947-13-71 |
_version_ | 1782281605070454784 |
---|---|
author | Fernandes, Andrea C Cloete, Danielle Broadbent, Matthew TM Hayes, Richard D Chang, Chin-Kuo Jackson, Richard G Roberts, Angus Tsang, Jason Soncul, Murat Liebscher, Jennifer Stewart, Robert Callard, Felicity |
author_facet | Fernandes, Andrea C Cloete, Danielle Broadbent, Matthew TM Hayes, Richard D Chang, Chin-Kuo Jackson, Richard G Roberts, Angus Tsang, Jason Soncul, Murat Liebscher, Jennifer Stewart, Robert Callard, Felicity |
author_sort | Fernandes, Andrea C |
collection | PubMed |
description | BACKGROUND: Electronic health records (EHRs) provide enormous potential for health research but also present data governance challenges. Ensuring de-identification is a pre-requisite for use of EHR data without prior consent. The South London and Maudsley NHS Trust (SLaM), one of the largest secondary mental healthcare providers in Europe, has developed, from its EHRs, a de-identified psychiatric case register, the Clinical Record Interactive Search (CRIS), for secondary research. METHODS: We describe development, implementation and evaluation of a bespoke de-identification algorithm used to create the register. It is designed to create dictionaries using patient identifiers (PIs) entered into dedicated source fields and then identify, match and mask them (with ZZZZZ) when they appear in medical texts. We deemed this approach would be effective, given high coverage of PI in the dedicated fields and the effectiveness of the masking combined with elements of a security model. We conducted two separate performance tests i) to test performance of the algorithm in masking individual true PIs entered in dedicated fields and then found in text (using 500 patient notes) and ii) to compare the performance of the CRIS pattern matching algorithm with a machine learning algorithm, called the MITRE Identification Scrubber Toolkit – MIST (using 70 patient notes – 50 notes to train, 20 notes to test on). We also report any incidences of potential breaches, defined by occurrences of 3 or more true or apparent PIs in the same patient’s notes (and in an additional set of longitudinal notes for 50 patients); and we consider the possibility of inferring information despite de-identification. RESULTS: True PIs were masked with 98.8% precision and 97.6% recall. As anticipated, potential PIs did appear, owing to misspellings entered within the EHRs. We found one potential breach. In a separate performance test, with a different set of notes, CRIS yielded 100% precision and 88.5% recall, while MIST yielded a 95.1% and 78.1%, respectively. We discuss how we overcome the realistic possibility – albeit of low probability – of potential breaches through implementation of the security model. CONCLUSION: CRIS is a de-identified psychiatric database sourced from EHRs, which protects patient anonymity and maximises data available for research. CRIS demonstrates the advantage of combining an effective de-identification algorithm with a carefully designed security model. The paper advances much needed discussion of EHR de-identification – particularly in relation to criteria to assess de-identification, and considering the contexts of de-identified research databases when assessing the risk of breaches of confidential patient information. |
format | Online Article Text |
id | pubmed-3751474 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-37514742013-08-24 Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records Fernandes, Andrea C Cloete, Danielle Broadbent, Matthew TM Hayes, Richard D Chang, Chin-Kuo Jackson, Richard G Roberts, Angus Tsang, Jason Soncul, Murat Liebscher, Jennifer Stewart, Robert Callard, Felicity BMC Med Inform Decis Mak Research Article BACKGROUND: Electronic health records (EHRs) provide enormous potential for health research but also present data governance challenges. Ensuring de-identification is a pre-requisite for use of EHR data without prior consent. The South London and Maudsley NHS Trust (SLaM), one of the largest secondary mental healthcare providers in Europe, has developed, from its EHRs, a de-identified psychiatric case register, the Clinical Record Interactive Search (CRIS), for secondary research. METHODS: We describe development, implementation and evaluation of a bespoke de-identification algorithm used to create the register. It is designed to create dictionaries using patient identifiers (PIs) entered into dedicated source fields and then identify, match and mask them (with ZZZZZ) when they appear in medical texts. We deemed this approach would be effective, given high coverage of PI in the dedicated fields and the effectiveness of the masking combined with elements of a security model. We conducted two separate performance tests i) to test performance of the algorithm in masking individual true PIs entered in dedicated fields and then found in text (using 500 patient notes) and ii) to compare the performance of the CRIS pattern matching algorithm with a machine learning algorithm, called the MITRE Identification Scrubber Toolkit – MIST (using 70 patient notes – 50 notes to train, 20 notes to test on). We also report any incidences of potential breaches, defined by occurrences of 3 or more true or apparent PIs in the same patient’s notes (and in an additional set of longitudinal notes for 50 patients); and we consider the possibility of inferring information despite de-identification. RESULTS: True PIs were masked with 98.8% precision and 97.6% recall. As anticipated, potential PIs did appear, owing to misspellings entered within the EHRs. We found one potential breach. In a separate performance test, with a different set of notes, CRIS yielded 100% precision and 88.5% recall, while MIST yielded a 95.1% and 78.1%, respectively. We discuss how we overcome the realistic possibility – albeit of low probability – of potential breaches through implementation of the security model. CONCLUSION: CRIS is a de-identified psychiatric database sourced from EHRs, which protects patient anonymity and maximises data available for research. CRIS demonstrates the advantage of combining an effective de-identification algorithm with a carefully designed security model. The paper advances much needed discussion of EHR de-identification – particularly in relation to criteria to assess de-identification, and considering the contexts of de-identified research databases when assessing the risk of breaches of confidential patient information. BioMed Central 2013-07-11 /pmc/articles/PMC3751474/ /pubmed/23842533 http://dx.doi.org/10.1186/1472-6947-13-71 Text en Copyright © 2013 Fernandes et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Fernandes, Andrea C Cloete, Danielle Broadbent, Matthew TM Hayes, Richard D Chang, Chin-Kuo Jackson, Richard G Roberts, Angus Tsang, Jason Soncul, Murat Liebscher, Jennifer Stewart, Robert Callard, Felicity Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records |
title | Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records |
title_full | Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records |
title_fullStr | Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records |
title_full_unstemmed | Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records |
title_short | Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records |
title_sort | development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751474/ https://www.ncbi.nlm.nih.gov/pubmed/23842533 http://dx.doi.org/10.1186/1472-6947-13-71 |
work_keys_str_mv | AT fernandesandreac developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords AT cloetedanielle developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords AT broadbentmatthewtm developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords AT hayesrichardd developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords AT changchinkuo developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords AT jacksonrichardg developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords AT robertsangus developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords AT tsangjason developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords AT sonculmurat developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords AT liebscherjennifer developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords AT stewartrobert developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords AT callardfelicity developmentandevaluationofadeidentificationprocedureforacaseregistersourcedfrommentalhealthelectronicrecords |