Cargando…

Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers

Clinical text de-identification enables collaborative research while protecting patient privacy and confidentiality; however, concerns persist about the reduction in the utility of the de-identified text for information extraction and machine learning tasks. In the context of a deep learning experim...

Descripción completa

Detalles Bibliográficos
Autores principales: Obeid, Jihad S., Heider, Paul M., Weeda, Erin R., Matuskowitz, Andrew J., Carr, Christine M., Gagnon, Kevin, Crawford, Tami, Meystre, Stephane M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6779034/
https://www.ncbi.nlm.nih.gov/pubmed/31437930
http://dx.doi.org/10.3233/SHTI190228
_version_ 1783456876105564160
author Obeid, Jihad S.
Heider, Paul M.
Weeda, Erin R.
Matuskowitz, Andrew J.
Carr, Christine M.
Gagnon, Kevin
Crawford, Tami
Meystre, Stephane M.
author_facet Obeid, Jihad S.
Heider, Paul M.
Weeda, Erin R.
Matuskowitz, Andrew J.
Carr, Christine M.
Gagnon, Kevin
Crawford, Tami
Meystre, Stephane M.
author_sort Obeid, Jihad S.
collection PubMed
description Clinical text de-identification enables collaborative research while protecting patient privacy and confidentiality; however, concerns persist about the reduction in the utility of the de-identified text for information extraction and machine learning tasks. In the context of a deep learning experiment to detect altered mental status in emergency department provider notes, we tested several classifiers on clinical notes in their original form and on their automatically de-identified counterpart. We tested both traditional bag-of-words based machine learning models as well as word-embedding based deep learning models. We evaluated the models on 1,113 history of present illness notes. A total of 1,795 protected health information tokens were replaced in the de-identification process across all notes. The deep learning models had the best performance with accuracies of 95% on both original and de-identified notes. However, there was no significant difference in the performance of any of the models on the original vs. the de-identified notes.
format Online
Article
Text
id pubmed-6779034
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-67790342019-10-07 Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers Obeid, Jihad S. Heider, Paul M. Weeda, Erin R. Matuskowitz, Andrew J. Carr, Christine M. Gagnon, Kevin Crawford, Tami Meystre, Stephane M. Stud Health Technol Inform Article Clinical text de-identification enables collaborative research while protecting patient privacy and confidentiality; however, concerns persist about the reduction in the utility of the de-identified text for information extraction and machine learning tasks. In the context of a deep learning experiment to detect altered mental status in emergency department provider notes, we tested several classifiers on clinical notes in their original form and on their automatically de-identified counterpart. We tested both traditional bag-of-words based machine learning models as well as word-embedding based deep learning models. We evaluated the models on 1,113 history of present illness notes. A total of 1,795 protected health information tokens were replaced in the de-identification process across all notes. The deep learning models had the best performance with accuracies of 95% on both original and de-identified notes. However, there was no significant difference in the performance of any of the models on the original vs. the de-identified notes. 2019-08-21 /pmc/articles/PMC6779034/ /pubmed/31437930 http://dx.doi.org/10.3233/SHTI190228 Text en http://creativecommons.org/licenses/by-nc/4.0/ This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
spellingShingle Article
Obeid, Jihad S.
Heider, Paul M.
Weeda, Erin R.
Matuskowitz, Andrew J.
Carr, Christine M.
Gagnon, Kevin
Crawford, Tami
Meystre, Stephane M.
Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers
title Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers
title_full Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers
title_fullStr Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers
title_full_unstemmed Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers
title_short Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers
title_sort impact of de-identification on clinical text classification using traditional and deep learning classifiers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6779034/
https://www.ncbi.nlm.nih.gov/pubmed/31437930
http://dx.doi.org/10.3233/SHTI190228
work_keys_str_mv AT obeidjihads impactofdeidentificationonclinicaltextclassificationusingtraditionalanddeeplearningclassifiers
AT heiderpaulm impactofdeidentificationonclinicaltextclassificationusingtraditionalanddeeplearningclassifiers
AT weedaerinr impactofdeidentificationonclinicaltextclassificationusingtraditionalanddeeplearningclassifiers
AT matuskowitzandrewj impactofdeidentificationonclinicaltextclassificationusingtraditionalanddeeplearningclassifiers
AT carrchristinem impactofdeidentificationonclinicaltextclassificationusingtraditionalanddeeplearningclassifiers
AT gagnonkevin impactofdeidentificationonclinicaltextclassificationusingtraditionalanddeeplearningclassifiers
AT crawfordtami impactofdeidentificationonclinicaltextclassificationusingtraditionalanddeeplearningclassifiers
AT meystrestephanem impactofdeidentificationonclinicaltextclassificationusingtraditionalanddeeplearningclassifiers