Cargando…

Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study

BACKGROUND: The amount of available textual health data such as scientific and biomedical literature is constantly growing and becoming more and more challenging for health professionals to properly summarize those data and practice evidence-based clinical decision making. Moreover, the exploration...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahne, Adrian, Fagherazzi, Guy, Tannier, Xavier, Czernichow, Thomas, Orchard, Francisco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8808347/
https://www.ncbi.nlm.nih.gov/pubmed/35040795
http://dx.doi.org/10.2196/27434
_version_ 1784643868629663744
author Ahne, Adrian
Fagherazzi, Guy
Tannier, Xavier
Czernichow, Thomas
Orchard, Francisco
author_facet Ahne, Adrian
Fagherazzi, Guy
Tannier, Xavier
Czernichow, Thomas
Orchard, Francisco
author_sort Ahne, Adrian
collection PubMed
description BACKGROUND: The amount of available textual health data such as scientific and biomedical literature is constantly growing and becoming more and more challenging for health professionals to properly summarize those data and practice evidence-based clinical decision making. Moreover, the exploration of unstructured health text data is challenging for professionals without computer science knowledge due to limited time, resources, and skills. Current tools to explore text data lack ease of use, require high computational efforts, and incorporate domain knowledge and focus on topics of interest with difficulty. OBJECTIVE: We developed a methodology able to explore and target topics of interest via an interactive user interface for health professionals with limited computer science knowledge. We aim to reach near state-of-the-art performance while reducing memory consumption, increasing scalability, and minimizing user interaction effort to improve the clinical decision-making process. The performance was evaluated on diabetes-related abstracts from PubMed. METHODS: The methodology consists of 4 parts: (1) a novel interpretable hierarchical clustering of documents where each node is defined by headwords (words that best represent the documents in the node), (2) an efficient classification system to target topics, (3) minimized user interaction effort through active learning, and (4) a visual user interface. We evaluated our approach on 50,911 diabetes-related abstracts providing a hierarchical Medical Subject Headings (MeSH) structure, a unique identifier for a topic. Hierarchical clustering performance was compared against the implementation in the machine learning library scikit-learn. On a subset of 2000 randomly chosen diabetes abstracts, our active learning strategy was compared against 3 other strategies: random selection of training instances, uncertainty sampling that chooses instances about which the model is most uncertain, and an expected gradient length strategy based on convolutional neural networks (CNNs). RESULTS: For the hierarchical clustering performance, we achieved an F1 score of 0.73 compared to 0.76 achieved by scikit-learn. Concerning active learning performance, after 200 chosen training samples based on these strategies, the weighted F1 score of all MeSH codes resulted in a satisfying 0.62 F1 score using our approach, 0.61 using the uncertainty strategy, 0.63 using the CNN, and 0.45 using the random strategy. Moreover, our methodology showed a constant low memory use with increased number of documents. CONCLUSIONS: We proposed an easy-to-use tool for health professionals with limited computer science knowledge who combine their domain knowledge with topic exploration and target specific topics of interest while improving transparency. Furthermore, our approach is memory efficient and highly parallelizable, making it interesting for large Big Data sets. This approach can be used by health professionals to gain deep insights into biomedical literature to ultimately improve the evidence-based clinical decision making process.
format Online
Article
Text
id pubmed-8808347
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-88083472022-02-04 Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study Ahne, Adrian Fagherazzi, Guy Tannier, Xavier Czernichow, Thomas Orchard, Francisco J Med Internet Res Original Paper BACKGROUND: The amount of available textual health data such as scientific and biomedical literature is constantly growing and becoming more and more challenging for health professionals to properly summarize those data and practice evidence-based clinical decision making. Moreover, the exploration of unstructured health text data is challenging for professionals without computer science knowledge due to limited time, resources, and skills. Current tools to explore text data lack ease of use, require high computational efforts, and incorporate domain knowledge and focus on topics of interest with difficulty. OBJECTIVE: We developed a methodology able to explore and target topics of interest via an interactive user interface for health professionals with limited computer science knowledge. We aim to reach near state-of-the-art performance while reducing memory consumption, increasing scalability, and minimizing user interaction effort to improve the clinical decision-making process. The performance was evaluated on diabetes-related abstracts from PubMed. METHODS: The methodology consists of 4 parts: (1) a novel interpretable hierarchical clustering of documents where each node is defined by headwords (words that best represent the documents in the node), (2) an efficient classification system to target topics, (3) minimized user interaction effort through active learning, and (4) a visual user interface. We evaluated our approach on 50,911 diabetes-related abstracts providing a hierarchical Medical Subject Headings (MeSH) structure, a unique identifier for a topic. Hierarchical clustering performance was compared against the implementation in the machine learning library scikit-learn. On a subset of 2000 randomly chosen diabetes abstracts, our active learning strategy was compared against 3 other strategies: random selection of training instances, uncertainty sampling that chooses instances about which the model is most uncertain, and an expected gradient length strategy based on convolutional neural networks (CNNs). RESULTS: For the hierarchical clustering performance, we achieved an F1 score of 0.73 compared to 0.76 achieved by scikit-learn. Concerning active learning performance, after 200 chosen training samples based on these strategies, the weighted F1 score of all MeSH codes resulted in a satisfying 0.62 F1 score using our approach, 0.61 using the uncertainty strategy, 0.63 using the CNN, and 0.45 using the random strategy. Moreover, our methodology showed a constant low memory use with increased number of documents. CONCLUSIONS: We proposed an easy-to-use tool for health professionals with limited computer science knowledge who combine their domain knowledge with topic exploration and target specific topics of interest while improving transparency. Furthermore, our approach is memory efficient and highly parallelizable, making it interesting for large Big Data sets. This approach can be used by health professionals to gain deep insights into biomedical literature to ultimately improve the evidence-based clinical decision making process. JMIR Publications 2022-01-18 /pmc/articles/PMC8808347/ /pubmed/35040795 http://dx.doi.org/10.2196/27434 Text en ©Adrian Ahne, Guy Fagherazzi, Xavier Tannier, Thomas Czernichow, Francisco Orchard. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.01.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Ahne, Adrian
Fagherazzi, Guy
Tannier, Xavier
Czernichow, Thomas
Orchard, Francisco
Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study
title Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study
title_full Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study
title_fullStr Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study
title_full_unstemmed Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study
title_short Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study
title_sort improving diabetes-related biomedical literature exploration in the clinical decision-making process via interactive classification and topic discovery: methodology development study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8808347/
https://www.ncbi.nlm.nih.gov/pubmed/35040795
http://dx.doi.org/10.2196/27434
work_keys_str_mv AT ahneadrian improvingdiabetesrelatedbiomedicalliteratureexplorationintheclinicaldecisionmakingprocessviainteractiveclassificationandtopicdiscoverymethodologydevelopmentstudy
AT fagherazziguy improvingdiabetesrelatedbiomedicalliteratureexplorationintheclinicaldecisionmakingprocessviainteractiveclassificationandtopicdiscoverymethodologydevelopmentstudy
AT tannierxavier improvingdiabetesrelatedbiomedicalliteratureexplorationintheclinicaldecisionmakingprocessviainteractiveclassificationandtopicdiscoverymethodologydevelopmentstudy
AT czernichowthomas improvingdiabetesrelatedbiomedicalliteratureexplorationintheclinicaldecisionmakingprocessviainteractiveclassificationandtopicdiscoverymethodologydevelopmentstudy
AT orchardfrancisco improvingdiabetesrelatedbiomedicalliteratureexplorationintheclinicaldecisionmakingprocessviainteractiveclassificationandtopicdiscoverymethodologydevelopmentstudy