Cargando…

Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study

BACKGROUND: The narrative free-text data in electronic medical records (EMRs) contain valuable clinical information for analysis and research to inform better patient care. However, the release of free text for secondary use is hindered by concerns surrounding personally identifiable information (PI...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Leibo, Perez-Concha, Oscar, Nguyen, Anthony, Bennett, Vicki, Blake, Victoria, Gallego, Blanca, Jorm, Louisa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10492176/
https://www.ncbi.nlm.nih.gov/pubmed/37624624
http://dx.doi.org/10.2196/46322
_version_ 1785104195488055296
author Liu, Leibo
Perez-Concha, Oscar
Nguyen, Anthony
Bennett, Vicki
Blake, Victoria
Gallego, Blanca
Jorm, Louisa
author_facet Liu, Leibo
Perez-Concha, Oscar
Nguyen, Anthony
Bennett, Vicki
Blake, Victoria
Gallego, Blanca
Jorm, Louisa
author_sort Liu, Leibo
collection PubMed
description BACKGROUND: The narrative free-text data in electronic medical records (EMRs) contain valuable clinical information for analysis and research to inform better patient care. However, the release of free text for secondary use is hindered by concerns surrounding personally identifiable information (PII), as protecting individuals' privacy is paramount. Therefore, it is necessary to deidentify free text to remove PII. Manual deidentification is a time-consuming and labor-intensive process. Numerous automated deidentification approaches and systems have been attempted to overcome this challenge over the past decade. OBJECTIVE: We sought to develop an accurate, web-based system deidentifying free text (DEFT), which can be readily and easily adopted in real-world settings for deidentification of free text in EMRs. The system has several key features including a simple and task-focused web user interface, customized PII types, use of a state-of-the-art deep learning model for tagging PII from free text, preannotation by an interactive learning loop, rapid manual annotation with autosave, support for project management and team collaboration, user access control, and central data storage. METHODS: DEFT comprises frontend and backend modules and communicates with central data storage through a filesystem path access. The frontend web user interface provides end users with a user-friendly workspace for managing and annotating free text. The backend module processes the requests from the frontend and performs relevant persistence operations. DEFT manages the deidentification workflow as a project, which can contain one or more data sets. Customized PII types and user access control can also be configured. The deep learning model is based on a Bidirectional Long Short-Term Memory-Conditional Random Field (BiLSTM-CRF) with RoBERTa as the word embedding layer. The interactive learning loop is further integrated into DEFT to speed up the deidentification process and increase its performance over time. RESULTS: DEFT has many advantages over existing deidentification systems in terms of its support for project management, user access control, data management, and an interactive learning process. Experimental results from DEFT on the 2014 i2b2 data set obtained the highest performance compared to 5 benchmark models in terms of microaverage strict entity–level recall and F(1)-scores of 0.9563 and 0.9627, respectively. In a real-world use case of deidentifying clinical notes, extracted from 1 referral hospital in Sydney, New South Wales, Australia, DEFT achieved a high microaverage strict entity–level F(1)-score of 0.9507 on a corpus of 600 annotated clinical notes. Moreover, the manual annotation process with preannotation demonstrated a 43% increase in work efficiency compared to the process without preannotation. CONCLUSIONS: DEFT is designed for health domain researchers and data custodians to easily deidentify free text in EMRs. DEFT supports an interactive learning loop and end users with minimal technical knowledge can perform the deidentification work with only a shallow learning curve.
format Online
Article
Text
id pubmed-10492176
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-104921762023-09-10 Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study Liu, Leibo Perez-Concha, Oscar Nguyen, Anthony Bennett, Vicki Blake, Victoria Gallego, Blanca Jorm, Louisa Interact J Med Res Original Paper BACKGROUND: The narrative free-text data in electronic medical records (EMRs) contain valuable clinical information for analysis and research to inform better patient care. However, the release of free text for secondary use is hindered by concerns surrounding personally identifiable information (PII), as protecting individuals' privacy is paramount. Therefore, it is necessary to deidentify free text to remove PII. Manual deidentification is a time-consuming and labor-intensive process. Numerous automated deidentification approaches and systems have been attempted to overcome this challenge over the past decade. OBJECTIVE: We sought to develop an accurate, web-based system deidentifying free text (DEFT), which can be readily and easily adopted in real-world settings for deidentification of free text in EMRs. The system has several key features including a simple and task-focused web user interface, customized PII types, use of a state-of-the-art deep learning model for tagging PII from free text, preannotation by an interactive learning loop, rapid manual annotation with autosave, support for project management and team collaboration, user access control, and central data storage. METHODS: DEFT comprises frontend and backend modules and communicates with central data storage through a filesystem path access. The frontend web user interface provides end users with a user-friendly workspace for managing and annotating free text. The backend module processes the requests from the frontend and performs relevant persistence operations. DEFT manages the deidentification workflow as a project, which can contain one or more data sets. Customized PII types and user access control can also be configured. The deep learning model is based on a Bidirectional Long Short-Term Memory-Conditional Random Field (BiLSTM-CRF) with RoBERTa as the word embedding layer. The interactive learning loop is further integrated into DEFT to speed up the deidentification process and increase its performance over time. RESULTS: DEFT has many advantages over existing deidentification systems in terms of its support for project management, user access control, data management, and an interactive learning process. Experimental results from DEFT on the 2014 i2b2 data set obtained the highest performance compared to 5 benchmark models in terms of microaverage strict entity–level recall and F(1)-scores of 0.9563 and 0.9627, respectively. In a real-world use case of deidentifying clinical notes, extracted from 1 referral hospital in Sydney, New South Wales, Australia, DEFT achieved a high microaverage strict entity–level F(1)-score of 0.9507 on a corpus of 600 annotated clinical notes. Moreover, the manual annotation process with preannotation demonstrated a 43% increase in work efficiency compared to the process without preannotation. CONCLUSIONS: DEFT is designed for health domain researchers and data custodians to easily deidentify free text in EMRs. DEFT supports an interactive learning loop and end users with minimal technical knowledge can perform the deidentification work with only a shallow learning curve. JMIR Publications 2023-08-25 /pmc/articles/PMC10492176/ /pubmed/37624624 http://dx.doi.org/10.2196/46322 Text en ©Leibo Liu, Oscar Perez-Concha, Anthony Nguyen, Vicki Bennett, Victoria Blake, Blanca Gallego, Louisa Jorm. Originally published in the Interactive Journal of Medical Research (https://www.i-jmr.org/), 25.08.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Interactive Journal of Medical Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.i-jmr.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Liu, Leibo
Perez-Concha, Oscar
Nguyen, Anthony
Bennett, Vicki
Blake, Victoria
Gallego, Blanca
Jorm, Louisa
Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study
title Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study
title_full Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study
title_fullStr Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study
title_full_unstemmed Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study
title_short Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study
title_sort web-based application based on human-in-the-loop deep learning for deidentifying free-text data in electronic medical records: development and usability study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10492176/
https://www.ncbi.nlm.nih.gov/pubmed/37624624
http://dx.doi.org/10.2196/46322
work_keys_str_mv AT liuleibo webbasedapplicationbasedonhumanintheloopdeeplearningfordeidentifyingfreetextdatainelectronicmedicalrecordsdevelopmentandusabilitystudy
AT perezconchaoscar webbasedapplicationbasedonhumanintheloopdeeplearningfordeidentifyingfreetextdatainelectronicmedicalrecordsdevelopmentandusabilitystudy
AT nguyenanthony webbasedapplicationbasedonhumanintheloopdeeplearningfordeidentifyingfreetextdatainelectronicmedicalrecordsdevelopmentandusabilitystudy
AT bennettvicki webbasedapplicationbasedonhumanintheloopdeeplearningfordeidentifyingfreetextdatainelectronicmedicalrecordsdevelopmentandusabilitystudy
AT blakevictoria webbasedapplicationbasedonhumanintheloopdeeplearningfordeidentifyingfreetextdatainelectronicmedicalrecordsdevelopmentandusabilitystudy
AT gallegoblanca webbasedapplicationbasedonhumanintheloopdeeplearningfordeidentifyingfreetextdatainelectronicmedicalrecordsdevelopmentandusabilitystudy
AT jormlouisa webbasedapplicationbasedonhumanintheloopdeeplearningfordeidentifyingfreetextdatainelectronicmedicalrecordsdevelopmentandusabilitystudy