Cargando…
A De-identification Method for Bilingual Clinical Texts of Various Note Types
De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Korean Academy of Medical Sciences
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4278030/ https://www.ncbi.nlm.nih.gov/pubmed/25552878 http://dx.doi.org/10.3346/jkms.2015.30.1.7 |
_version_ | 1782350456665669632 |
---|---|
author | Shin, Soo-Yong Park, Yu Rang Shin, Yongdon Choi, Hyo Joung Park, Jihyun Lyu, Yongman Lee, Moo-Song Choi, Chang-Min Kim, Woo-Sung Lee, Jae Ho |
author_facet | Shin, Soo-Yong Park, Yu Rang Shin, Yongdon Choi, Hyo Joung Park, Jihyun Lyu, Yongman Lee, Moo-Song Choi, Chang-Min Kim, Woo-Sung Lee, Jae Ho |
author_sort | Shin, Soo-Yong |
collection | PubMed |
description | De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods only focused on narrative text written in English. In this study, we propose a regular expression-based de-identification method used to address bilingual clinical records written in Korean and English. To develop and validate regular expression rules, we obtained training and validation datasets composed of 6,039 clinical notes of 20 types and 5,000 notes of 33 types, respectively. Fifteen regular expression rules were constructed using the development dataset and those rules achieved 99.87% precision and 96.25% recall for the validation dataset. Our de-identification method successfully removed the identifiers in diverse types of bilingual clinical narrative texts. This method will thus assist physicians to more easily perform retrospective research. GRAPHICAL ABSTRACT: [Image: see text] |
format | Online Article Text |
id | pubmed-4278030 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | The Korean Academy of Medical Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-42780302015-01-01 A De-identification Method for Bilingual Clinical Texts of Various Note Types Shin, Soo-Yong Park, Yu Rang Shin, Yongdon Choi, Hyo Joung Park, Jihyun Lyu, Yongman Lee, Moo-Song Choi, Chang-Min Kim, Woo-Sung Lee, Jae Ho J Korean Med Sci Original Article De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods only focused on narrative text written in English. In this study, we propose a regular expression-based de-identification method used to address bilingual clinical records written in Korean and English. To develop and validate regular expression rules, we obtained training and validation datasets composed of 6,039 clinical notes of 20 types and 5,000 notes of 33 types, respectively. Fifteen regular expression rules were constructed using the development dataset and those rules achieved 99.87% precision and 96.25% recall for the validation dataset. Our de-identification method successfully removed the identifiers in diverse types of bilingual clinical narrative texts. This method will thus assist physicians to more easily perform retrospective research. GRAPHICAL ABSTRACT: [Image: see text] The Korean Academy of Medical Sciences 2015-01 2014-12-23 /pmc/articles/PMC4278030/ /pubmed/25552878 http://dx.doi.org/10.3346/jkms.2015.30.1.7 Text en © 2015 The Korean Academy of Medical Sciences. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Shin, Soo-Yong Park, Yu Rang Shin, Yongdon Choi, Hyo Joung Park, Jihyun Lyu, Yongman Lee, Moo-Song Choi, Chang-Min Kim, Woo-Sung Lee, Jae Ho A De-identification Method for Bilingual Clinical Texts of Various Note Types |
title | A De-identification Method for Bilingual Clinical Texts of Various Note Types |
title_full | A De-identification Method for Bilingual Clinical Texts of Various Note Types |
title_fullStr | A De-identification Method for Bilingual Clinical Texts of Various Note Types |
title_full_unstemmed | A De-identification Method for Bilingual Clinical Texts of Various Note Types |
title_short | A De-identification Method for Bilingual Clinical Texts of Various Note Types |
title_sort | de-identification method for bilingual clinical texts of various note types |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4278030/ https://www.ncbi.nlm.nih.gov/pubmed/25552878 http://dx.doi.org/10.3346/jkms.2015.30.1.7 |
work_keys_str_mv | AT shinsooyong adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT parkyurang adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT shinyongdon adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT choihyojoung adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT parkjihyun adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT lyuyongman adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT leemoosong adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT choichangmin adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT kimwoosung adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT leejaeho adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT shinsooyong deidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT parkyurang deidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT shinyongdon deidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT choihyojoung deidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT parkjihyun deidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT lyuyongman deidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT leemoosong deidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT choichangmin deidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT kimwoosung deidentificationmethodforbilingualclinicaltextsofvariousnotetypes AT leejaeho deidentificationmethodforbilingualclinicaltextsofvariousnotetypes |