Cargando…

A De-identification Method for Bilingual Clinical Texts of Various Note Types

De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods...

Descripción completa

Detalles Bibliográficos
Autores principales: Shin, Soo-Yong, Park, Yu Rang, Shin, Yongdon, Choi, Hyo Joung, Park, Jihyun, Lyu, Yongman, Lee, Moo-Song, Choi, Chang-Min, Kim, Woo-Sung, Lee, Jae Ho
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Korean Academy of Medical Sciences 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4278030/
https://www.ncbi.nlm.nih.gov/pubmed/25552878
http://dx.doi.org/10.3346/jkms.2015.30.1.7
_version_ 1782350456665669632
author Shin, Soo-Yong
Park, Yu Rang
Shin, Yongdon
Choi, Hyo Joung
Park, Jihyun
Lyu, Yongman
Lee, Moo-Song
Choi, Chang-Min
Kim, Woo-Sung
Lee, Jae Ho
author_facet Shin, Soo-Yong
Park, Yu Rang
Shin, Yongdon
Choi, Hyo Joung
Park, Jihyun
Lyu, Yongman
Lee, Moo-Song
Choi, Chang-Min
Kim, Woo-Sung
Lee, Jae Ho
author_sort Shin, Soo-Yong
collection PubMed
description De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods only focused on narrative text written in English. In this study, we propose a regular expression-based de-identification method used to address bilingual clinical records written in Korean and English. To develop and validate regular expression rules, we obtained training and validation datasets composed of 6,039 clinical notes of 20 types and 5,000 notes of 33 types, respectively. Fifteen regular expression rules were constructed using the development dataset and those rules achieved 99.87% precision and 96.25% recall for the validation dataset. Our de-identification method successfully removed the identifiers in diverse types of bilingual clinical narrative texts. This method will thus assist physicians to more easily perform retrospective research. GRAPHICAL ABSTRACT: [Image: see text]
format Online
Article
Text
id pubmed-4278030
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher The Korean Academy of Medical Sciences
record_format MEDLINE/PubMed
spelling pubmed-42780302015-01-01 A De-identification Method for Bilingual Clinical Texts of Various Note Types Shin, Soo-Yong Park, Yu Rang Shin, Yongdon Choi, Hyo Joung Park, Jihyun Lyu, Yongman Lee, Moo-Song Choi, Chang-Min Kim, Woo-Sung Lee, Jae Ho J Korean Med Sci Original Article De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods only focused on narrative text written in English. In this study, we propose a regular expression-based de-identification method used to address bilingual clinical records written in Korean and English. To develop and validate regular expression rules, we obtained training and validation datasets composed of 6,039 clinical notes of 20 types and 5,000 notes of 33 types, respectively. Fifteen regular expression rules were constructed using the development dataset and those rules achieved 99.87% precision and 96.25% recall for the validation dataset. Our de-identification method successfully removed the identifiers in diverse types of bilingual clinical narrative texts. This method will thus assist physicians to more easily perform retrospective research. GRAPHICAL ABSTRACT: [Image: see text] The Korean Academy of Medical Sciences 2015-01 2014-12-23 /pmc/articles/PMC4278030/ /pubmed/25552878 http://dx.doi.org/10.3346/jkms.2015.30.1.7 Text en © 2015 The Korean Academy of Medical Sciences. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Shin, Soo-Yong
Park, Yu Rang
Shin, Yongdon
Choi, Hyo Joung
Park, Jihyun
Lyu, Yongman
Lee, Moo-Song
Choi, Chang-Min
Kim, Woo-Sung
Lee, Jae Ho
A De-identification Method for Bilingual Clinical Texts of Various Note Types
title A De-identification Method for Bilingual Clinical Texts of Various Note Types
title_full A De-identification Method for Bilingual Clinical Texts of Various Note Types
title_fullStr A De-identification Method for Bilingual Clinical Texts of Various Note Types
title_full_unstemmed A De-identification Method for Bilingual Clinical Texts of Various Note Types
title_short A De-identification Method for Bilingual Clinical Texts of Various Note Types
title_sort de-identification method for bilingual clinical texts of various note types
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4278030/
https://www.ncbi.nlm.nih.gov/pubmed/25552878
http://dx.doi.org/10.3346/jkms.2015.30.1.7
work_keys_str_mv AT shinsooyong adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT parkyurang adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT shinyongdon adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT choihyojoung adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT parkjihyun adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT lyuyongman adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT leemoosong adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT choichangmin adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT kimwoosung adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT leejaeho adeidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT shinsooyong deidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT parkyurang deidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT shinyongdon deidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT choihyojoung deidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT parkjihyun deidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT lyuyongman deidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT leemoosong deidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT choichangmin deidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT kimwoosung deidentificationmethodforbilingualclinicaltextsofvariousnotetypes
AT leejaeho deidentificationmethodforbilingualclinicaltextsofvariousnotetypes