Cargando…
Topic Modeling for Interpretable Text Classification From EHRs
The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records fo...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9114871/ https://www.ncbi.nlm.nih.gov/pubmed/35600326 http://dx.doi.org/10.3389/fdata.2022.846930 |
_version_ | 1784709874858328064 |
---|---|
author | Rijcken, Emil Kaymak, Uzay Scheepers, Floortje Mosteiro, Pablo Zervanou, Kalliopi Spruit, Marco |
author_facet | Rijcken, Emil Kaymak, Uzay Scheepers, Floortje Mosteiro, Pablo Zervanou, Kalliopi Spruit, Marco |
author_sort | Rijcken, Emil |
collection | PubMed |
description | The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable. However, selecting the most effective topic model is not trivial. In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification. We compare 17 different topic models in terms of both interpretability and predictive performance in an inpatient violence prediction task using clinical notes. We find no correlation between interpretability and predictive performance. In addition, our results show that although no model outperforms the other models on both variables, our proposed fuzzy topic modeling algorithm (FLSA-W) performs best in most settings for interpretability, whereas two state-of-the-art methods (ProdLDA and LSI) achieve the best predictive performance. |
format | Online Article Text |
id | pubmed-9114871 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-91148712022-05-19 Topic Modeling for Interpretable Text Classification From EHRs Rijcken, Emil Kaymak, Uzay Scheepers, Floortje Mosteiro, Pablo Zervanou, Kalliopi Spruit, Marco Front Big Data Big Data The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable. However, selecting the most effective topic model is not trivial. In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification. We compare 17 different topic models in terms of both interpretability and predictive performance in an inpatient violence prediction task using clinical notes. We find no correlation between interpretability and predictive performance. In addition, our results show that although no model outperforms the other models on both variables, our proposed fuzzy topic modeling algorithm (FLSA-W) performs best in most settings for interpretability, whereas two state-of-the-art methods (ProdLDA and LSI) achieve the best predictive performance. Frontiers Media S.A. 2022-05-04 /pmc/articles/PMC9114871/ /pubmed/35600326 http://dx.doi.org/10.3389/fdata.2022.846930 Text en Copyright © 2022 Rijcken, Kaymak, Scheepers, Mosteiro, Zervanou and Spruit. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Big Data Rijcken, Emil Kaymak, Uzay Scheepers, Floortje Mosteiro, Pablo Zervanou, Kalliopi Spruit, Marco Topic Modeling for Interpretable Text Classification From EHRs |
title | Topic Modeling for Interpretable Text Classification From EHRs |
title_full | Topic Modeling for Interpretable Text Classification From EHRs |
title_fullStr | Topic Modeling for Interpretable Text Classification From EHRs |
title_full_unstemmed | Topic Modeling for Interpretable Text Classification From EHRs |
title_short | Topic Modeling for Interpretable Text Classification From EHRs |
title_sort | topic modeling for interpretable text classification from ehrs |
topic | Big Data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9114871/ https://www.ncbi.nlm.nih.gov/pubmed/35600326 http://dx.doi.org/10.3389/fdata.2022.846930 |
work_keys_str_mv | AT rijckenemil topicmodelingforinterpretabletextclassificationfromehrs AT kaymakuzay topicmodelingforinterpretabletextclassificationfromehrs AT scheepersfloortje topicmodelingforinterpretabletextclassificationfromehrs AT mosteiropablo topicmodelingforinterpretabletextclassificationfromehrs AT zervanoukalliopi topicmodelingforinterpretabletextclassificationfromehrs AT spruitmarco topicmodelingforinterpretabletextclassificationfromehrs |