Cargando…

Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers

Clinical text de-identification enables collaborative research while protecting patient privacy and confidentiality; however, concerns persist about the reduction in the utility of the de-identified text for information extraction and machine learning tasks. In the context of a deep learning experim...

Descripción completa

Detalles Bibliográficos
Autores principales: Obeid, Jihad S., Heider, Paul M., Weeda, Erin R., Matuskowitz, Andrew J., Carr, Christine M., Gagnon, Kevin, Crawford, Tami, Meystre, Stephane M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6779034/
https://www.ncbi.nlm.nih.gov/pubmed/31437930
http://dx.doi.org/10.3233/SHTI190228
Descripción
Sumario:Clinical text de-identification enables collaborative research while protecting patient privacy and confidentiality; however, concerns persist about the reduction in the utility of the de-identified text for information extraction and machine learning tasks. In the context of a deep learning experiment to detect altered mental status in emergency department provider notes, we tested several classifiers on clinical notes in their original form and on their automatically de-identified counterpart. We tested both traditional bag-of-words based machine learning models as well as word-embedding based deep learning models. We evaluated the models on 1,113 history of present illness notes. A total of 1,795 protected health information tokens were replaced in the de-identification process across all notes. The deep learning models had the best performance with accuracies of 95% on both original and de-identified notes. However, there was no significant difference in the performance of any of the models on the original vs. the de-identified notes.