Cargando…

Re-examination of Rule-Based Methods in Deidentification of Electronic Health Records: Algorithm Development and Validation

BACKGROUND: Deidentification of clinical records is a critical step before their publication. This is usually treated as a type of sequence labeling task, and ensemble learning is one of the best performing solutions. Under the framework of multi-learner ensemble, the significance of a candidate rul...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Zhenyu, Yang, Muyun, Tang, Buzhou, Zhao, Tiejun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7226054/
https://www.ncbi.nlm.nih.gov/pubmed/32352384
http://dx.doi.org/10.2196/17622
_version_ 1783534198748872704
author Zhao, Zhenyu
Yang, Muyun
Tang, Buzhou
Zhao, Tiejun
author_facet Zhao, Zhenyu
Yang, Muyun
Tang, Buzhou
Zhao, Tiejun
author_sort Zhao, Zhenyu
collection PubMed
description BACKGROUND: Deidentification of clinical records is a critical step before their publication. This is usually treated as a type of sequence labeling task, and ensemble learning is one of the best performing solutions. Under the framework of multi-learner ensemble, the significance of a candidate rule-based learner remains an open issue. OBJECTIVE: The aim of this study is to investigate whether a rule-based learner is useful in a hybrid deidentification system and offer suggestions on how to build and integrate a rule-based learner. METHODS: We chose a data-driven rule-learner named transformation-based error-driven learning (TBED) and integrated it into the best performing hybrid system in this task. RESULTS: On the popular Informatics for Integrating Biology and the Bedside (i2b2) deidentification data set, experiments showed that TBED can offer high performance with its generated rules, and integrating the rule-based model into an ensemble framework, which reached an F1 score of 96.76%, achieved the best performance reported in the community. CONCLUSIONS: We proved the rule-based method offers an effective contribution to the current ensemble learning approach for the deidentification of clinical records. Such a rule system could be automatically learned by TBED, avoiding the high cost and low reliability of manual rule composition. In particular, we boosted the ensemble model with rules to create the best performance of the deidentification of clinical records.
format Online
Article
Text
id pubmed-7226054
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-72260542020-05-19 Re-examination of Rule-Based Methods in Deidentification of Electronic Health Records: Algorithm Development and Validation Zhao, Zhenyu Yang, Muyun Tang, Buzhou Zhao, Tiejun JMIR Med Inform Original Paper BACKGROUND: Deidentification of clinical records is a critical step before their publication. This is usually treated as a type of sequence labeling task, and ensemble learning is one of the best performing solutions. Under the framework of multi-learner ensemble, the significance of a candidate rule-based learner remains an open issue. OBJECTIVE: The aim of this study is to investigate whether a rule-based learner is useful in a hybrid deidentification system and offer suggestions on how to build and integrate a rule-based learner. METHODS: We chose a data-driven rule-learner named transformation-based error-driven learning (TBED) and integrated it into the best performing hybrid system in this task. RESULTS: On the popular Informatics for Integrating Biology and the Bedside (i2b2) deidentification data set, experiments showed that TBED can offer high performance with its generated rules, and integrating the rule-based model into an ensemble framework, which reached an F1 score of 96.76%, achieved the best performance reported in the community. CONCLUSIONS: We proved the rule-based method offers an effective contribution to the current ensemble learning approach for the deidentification of clinical records. Such a rule system could be automatically learned by TBED, avoiding the high cost and low reliability of manual rule composition. In particular, we boosted the ensemble model with rules to create the best performance of the deidentification of clinical records. JMIR Publications 2020-04-30 /pmc/articles/PMC7226054/ /pubmed/32352384 http://dx.doi.org/10.2196/17622 Text en ©Zhenyu Zhao, Muyun Yang, Buzhou Tang, Tiejun Zhao. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 30.04.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Zhao, Zhenyu
Yang, Muyun
Tang, Buzhou
Zhao, Tiejun
Re-examination of Rule-Based Methods in Deidentification of Electronic Health Records: Algorithm Development and Validation
title Re-examination of Rule-Based Methods in Deidentification of Electronic Health Records: Algorithm Development and Validation
title_full Re-examination of Rule-Based Methods in Deidentification of Electronic Health Records: Algorithm Development and Validation
title_fullStr Re-examination of Rule-Based Methods in Deidentification of Electronic Health Records: Algorithm Development and Validation
title_full_unstemmed Re-examination of Rule-Based Methods in Deidentification of Electronic Health Records: Algorithm Development and Validation
title_short Re-examination of Rule-Based Methods in Deidentification of Electronic Health Records: Algorithm Development and Validation
title_sort re-examination of rule-based methods in deidentification of electronic health records: algorithm development and validation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7226054/
https://www.ncbi.nlm.nih.gov/pubmed/32352384
http://dx.doi.org/10.2196/17622
work_keys_str_mv AT zhaozhenyu reexaminationofrulebasedmethodsindeidentificationofelectronichealthrecordsalgorithmdevelopmentandvalidation
AT yangmuyun reexaminationofrulebasedmethodsindeidentificationofelectronichealthrecordsalgorithmdevelopmentandvalidation
AT tangbuzhou reexaminationofrulebasedmethodsindeidentificationofelectronichealthrecordsalgorithmdevelopmentandvalidation
AT zhaotiejun reexaminationofrulebasedmethodsindeidentificationofelectronichealthrecordsalgorithmdevelopmentandvalidation