Cargando…

Topic Modeling for Interpretable Text Classification From EHRs

The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Rijcken, Emil, Kaymak, Uzay, Scheepers, Floortje, Mosteiro, Pablo, Zervanou, Kalliopi, Spruit, Marco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9114871/
https://www.ncbi.nlm.nih.gov/pubmed/35600326
http://dx.doi.org/10.3389/fdata.2022.846930
_version_ 1784709874858328064
author Rijcken, Emil
Kaymak, Uzay
Scheepers, Floortje
Mosteiro, Pablo
Zervanou, Kalliopi
Spruit, Marco
author_facet Rijcken, Emil
Kaymak, Uzay
Scheepers, Floortje
Mosteiro, Pablo
Zervanou, Kalliopi
Spruit, Marco
author_sort Rijcken, Emil
collection PubMed
description The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable. However, selecting the most effective topic model is not trivial. In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification. We compare 17 different topic models in terms of both interpretability and predictive performance in an inpatient violence prediction task using clinical notes. We find no correlation between interpretability and predictive performance. In addition, our results show that although no model outperforms the other models on both variables, our proposed fuzzy topic modeling algorithm (FLSA-W) performs best in most settings for interpretability, whereas two state-of-the-art methods (ProdLDA and LSI) achieve the best predictive performance.
format Online
Article
Text
id pubmed-9114871
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-91148712022-05-19 Topic Modeling for Interpretable Text Classification From EHRs Rijcken, Emil Kaymak, Uzay Scheepers, Floortje Mosteiro, Pablo Zervanou, Kalliopi Spruit, Marco Front Big Data Big Data The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable. However, selecting the most effective topic model is not trivial. In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification. We compare 17 different topic models in terms of both interpretability and predictive performance in an inpatient violence prediction task using clinical notes. We find no correlation between interpretability and predictive performance. In addition, our results show that although no model outperforms the other models on both variables, our proposed fuzzy topic modeling algorithm (FLSA-W) performs best in most settings for interpretability, whereas two state-of-the-art methods (ProdLDA and LSI) achieve the best predictive performance. Frontiers Media S.A. 2022-05-04 /pmc/articles/PMC9114871/ /pubmed/35600326 http://dx.doi.org/10.3389/fdata.2022.846930 Text en Copyright © 2022 Rijcken, Kaymak, Scheepers, Mosteiro, Zervanou and Spruit. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Rijcken, Emil
Kaymak, Uzay
Scheepers, Floortje
Mosteiro, Pablo
Zervanou, Kalliopi
Spruit, Marco
Topic Modeling for Interpretable Text Classification From EHRs
title Topic Modeling for Interpretable Text Classification From EHRs
title_full Topic Modeling for Interpretable Text Classification From EHRs
title_fullStr Topic Modeling for Interpretable Text Classification From EHRs
title_full_unstemmed Topic Modeling for Interpretable Text Classification From EHRs
title_short Topic Modeling for Interpretable Text Classification From EHRs
title_sort topic modeling for interpretable text classification from ehrs
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9114871/
https://www.ncbi.nlm.nih.gov/pubmed/35600326
http://dx.doi.org/10.3389/fdata.2022.846930
work_keys_str_mv AT rijckenemil topicmodelingforinterpretabletextclassificationfromehrs
AT kaymakuzay topicmodelingforinterpretabletextclassificationfromehrs
AT scheepersfloortje topicmodelingforinterpretabletextclassificationfromehrs
AT mosteiropablo topicmodelingforinterpretabletextclassificationfromehrs
AT zervanoukalliopi topicmodelingforinterpretabletextclassificationfromehrs
AT spruitmarco topicmodelingforinterpretabletextclassificationfromehrs