Cargando…
CRFs based de-identification of medical records
De-identification is a shared task of the 2014 i2b2/UTHealth challenge. The purpose of this task is to remove protected health information (PHI) from medical records. In this paper, we propose a novel de-identifier, WI-deId, based on conditional random fields (CRFs). A preprocessing module, which to...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4988860/ https://www.ncbi.nlm.nih.gov/pubmed/26315662 http://dx.doi.org/10.1016/j.jbi.2015.08.012 |
_version_ | 1782448484000989184 |
---|---|
author | He, Bin Guan, Yi Cheng, Jianyi Cen, Keting Hua, Wenlan |
author_facet | He, Bin Guan, Yi Cheng, Jianyi Cen, Keting Hua, Wenlan |
author_sort | He, Bin |
collection | PubMed |
description | De-identification is a shared task of the 2014 i2b2/UTHealth challenge. The purpose of this task is to remove protected health information (PHI) from medical records. In this paper, we propose a novel de-identifier, WI-deId, based on conditional random fields (CRFs). A preprocessing module, which tokenizes the medical records using regular expressions and an off-the-shelf tokenizer, is introduced, and three groups of features are extracted to train the de-identifier model. The experiment shows that our system is effective in the de-identification of medical records, achieving a micro-F1 of 0.9232 at the i2b2 strict entity evaluation level. |
format | Online Article Text |
id | pubmed-4988860 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
record_format | MEDLINE/PubMed |
spelling | pubmed-49888602016-08-17 CRFs based de-identification of medical records He, Bin Guan, Yi Cheng, Jianyi Cen, Keting Hua, Wenlan J Biomed Inform Article De-identification is a shared task of the 2014 i2b2/UTHealth challenge. The purpose of this task is to remove protected health information (PHI) from medical records. In this paper, we propose a novel de-identifier, WI-deId, based on conditional random fields (CRFs). A preprocessing module, which tokenizes the medical records using regular expressions and an off-the-shelf tokenizer, is introduced, and three groups of features are extracted to train the de-identifier model. The experiment shows that our system is effective in the de-identification of medical records, achieving a micro-F1 of 0.9232 at the i2b2 strict entity evaluation level. 2015-08-24 2015-12 /pmc/articles/PMC4988860/ /pubmed/26315662 http://dx.doi.org/10.1016/j.jbi.2015.08.012 Text en http://creativecommons.org/licenses/by-nc-nd/4.0/ This manuscript version is made available under the CC BY-NC-ND 4.0 license. |
spellingShingle | Article He, Bin Guan, Yi Cheng, Jianyi Cen, Keting Hua, Wenlan CRFs based de-identification of medical records |
title | CRFs based de-identification of medical records |
title_full | CRFs based de-identification of medical records |
title_fullStr | CRFs based de-identification of medical records |
title_full_unstemmed | CRFs based de-identification of medical records |
title_short | CRFs based de-identification of medical records |
title_sort | crfs based de-identification of medical records |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4988860/ https://www.ncbi.nlm.nih.gov/pubmed/26315662 http://dx.doi.org/10.1016/j.jbi.2015.08.012 |
work_keys_str_mv | AT hebin crfsbaseddeidentificationofmedicalrecords AT guanyi crfsbaseddeidentificationofmedicalrecords AT chengjianyi crfsbaseddeidentificationofmedicalrecords AT cenketing crfsbaseddeidentificationofmedicalrecords AT huawenlan crfsbaseddeidentificationofmedicalrecords |