Cargando…

CRFs based de-identification of medical records

De-identification is a shared task of the 2014 i2b2/UTHealth challenge. The purpose of this task is to remove protected health information (PHI) from medical records. In this paper, we propose a novel de-identifier, WI-deId, based on conditional random fields (CRFs). A preprocessing module, which to...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Bin, Guan, Yi, Cheng, Jianyi, Cen, Keting, Hua, Wenlan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4988860/
https://www.ncbi.nlm.nih.gov/pubmed/26315662
http://dx.doi.org/10.1016/j.jbi.2015.08.012
_version_ 1782448484000989184
author He, Bin
Guan, Yi
Cheng, Jianyi
Cen, Keting
Hua, Wenlan
author_facet He, Bin
Guan, Yi
Cheng, Jianyi
Cen, Keting
Hua, Wenlan
author_sort He, Bin
collection PubMed
description De-identification is a shared task of the 2014 i2b2/UTHealth challenge. The purpose of this task is to remove protected health information (PHI) from medical records. In this paper, we propose a novel de-identifier, WI-deId, based on conditional random fields (CRFs). A preprocessing module, which tokenizes the medical records using regular expressions and an off-the-shelf tokenizer, is introduced, and three groups of features are extracted to train the de-identifier model. The experiment shows that our system is effective in the de-identification of medical records, achieving a micro-F1 of 0.9232 at the i2b2 strict entity evaluation level.
format Online
Article
Text
id pubmed-4988860
institution National Center for Biotechnology Information
language English
publishDate 2015
record_format MEDLINE/PubMed
spelling pubmed-49888602016-08-17 CRFs based de-identification of medical records He, Bin Guan, Yi Cheng, Jianyi Cen, Keting Hua, Wenlan J Biomed Inform Article De-identification is a shared task of the 2014 i2b2/UTHealth challenge. The purpose of this task is to remove protected health information (PHI) from medical records. In this paper, we propose a novel de-identifier, WI-deId, based on conditional random fields (CRFs). A preprocessing module, which tokenizes the medical records using regular expressions and an off-the-shelf tokenizer, is introduced, and three groups of features are extracted to train the de-identifier model. The experiment shows that our system is effective in the de-identification of medical records, achieving a micro-F1 of 0.9232 at the i2b2 strict entity evaluation level. 2015-08-24 2015-12 /pmc/articles/PMC4988860/ /pubmed/26315662 http://dx.doi.org/10.1016/j.jbi.2015.08.012 Text en http://creativecommons.org/licenses/by-nc-nd/4.0/ This manuscript version is made available under the CC BY-NC-ND 4.0 license.
spellingShingle Article
He, Bin
Guan, Yi
Cheng, Jianyi
Cen, Keting
Hua, Wenlan
CRFs based de-identification of medical records
title CRFs based de-identification of medical records
title_full CRFs based de-identification of medical records
title_fullStr CRFs based de-identification of medical records
title_full_unstemmed CRFs based de-identification of medical records
title_short CRFs based de-identification of medical records
title_sort crfs based de-identification of medical records
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4988860/
https://www.ncbi.nlm.nih.gov/pubmed/26315662
http://dx.doi.org/10.1016/j.jbi.2015.08.012
work_keys_str_mv AT hebin crfsbaseddeidentificationofmedicalrecords
AT guanyi crfsbaseddeidentificationofmedicalrecords
AT chengjianyi crfsbaseddeidentificationofmedicalrecords
AT cenketing crfsbaseddeidentificationofmedicalrecords
AT huawenlan crfsbaseddeidentificationofmedicalrecords