Cargando…

Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling

Text mining is an important research direction, which involves several fields, such as information retrieval, information extraction, and text categorization. In this paper, we propose an efficient multiple classifier approach to text categorization based on swarm-optimized topic modelling. The Late...

Descripción completa

Detalles Bibliográficos
Autor principal: Onan, Aytuğ
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6081524/
https://www.ncbi.nlm.nih.gov/pubmed/30140300
http://dx.doi.org/10.1155/2018/2497471
_version_ 1783345664935067648
author Onan, Aytuğ
author_facet Onan, Aytuğ
author_sort Onan, Aytuğ
collection PubMed
description Text mining is an important research direction, which involves several fields, such as information retrieval, information extraction, and text categorization. In this paper, we propose an efficient multiple classifier approach to text categorization based on swarm-optimized topic modelling. The Latent Dirichlet allocation (LDA) can overcome the high dimensionality problem of vector space model, but identifying appropriate parameter values is critical to performance of LDA. Swarm-optimized approach estimates the parameters of LDA, including the number of topics and all the other parameters involved in LDA. The hybrid ensemble pruning approach based on combined diversity measures and clustering aims to obtain a multiple classifier system with high predictive performance and better diversity. In this scheme, four different diversity measures (namely, disagreement measure, Q-statistics, the correlation coefficient, and the double fault measure) among classifiers of the ensemble are combined. Based on the combined diversity matrix, a swarm intelligence based clustering algorithm is employed to partition the classifiers into a number of disjoint groups and one classifier (with the highest predictive performance) from each cluster is selected to build the final multiple classifier system. The experimental results based on five biomedical text benchmarks have been conducted. In the swarm-optimized LDA, different metaheuristic algorithms (such as genetic algorithms, particle swarm optimization, firefly algorithm, cuckoo search algorithm, and bat algorithm) are considered. In the ensemble pruning, five metaheuristic clustering algorithms are evaluated. The experimental results on biomedical text benchmarks indicate that swarm-optimized LDA yields better predictive performance compared to the conventional LDA. In addition, the proposed multiple classifier system outperforms the conventional classification algorithms, ensemble learning, and ensemble pruning methods.
format Online
Article
Text
id pubmed-6081524
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-60815242018-08-23 Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling Onan, Aytuğ Comput Math Methods Med Research Article Text mining is an important research direction, which involves several fields, such as information retrieval, information extraction, and text categorization. In this paper, we propose an efficient multiple classifier approach to text categorization based on swarm-optimized topic modelling. The Latent Dirichlet allocation (LDA) can overcome the high dimensionality problem of vector space model, but identifying appropriate parameter values is critical to performance of LDA. Swarm-optimized approach estimates the parameters of LDA, including the number of topics and all the other parameters involved in LDA. The hybrid ensemble pruning approach based on combined diversity measures and clustering aims to obtain a multiple classifier system with high predictive performance and better diversity. In this scheme, four different diversity measures (namely, disagreement measure, Q-statistics, the correlation coefficient, and the double fault measure) among classifiers of the ensemble are combined. Based on the combined diversity matrix, a swarm intelligence based clustering algorithm is employed to partition the classifiers into a number of disjoint groups and one classifier (with the highest predictive performance) from each cluster is selected to build the final multiple classifier system. The experimental results based on five biomedical text benchmarks have been conducted. In the swarm-optimized LDA, different metaheuristic algorithms (such as genetic algorithms, particle swarm optimization, firefly algorithm, cuckoo search algorithm, and bat algorithm) are considered. In the ensemble pruning, five metaheuristic clustering algorithms are evaluated. The experimental results on biomedical text benchmarks indicate that swarm-optimized LDA yields better predictive performance compared to the conventional LDA. In addition, the proposed multiple classifier system outperforms the conventional classification algorithms, ensemble learning, and ensemble pruning methods. Hindawi 2018-07-22 /pmc/articles/PMC6081524/ /pubmed/30140300 http://dx.doi.org/10.1155/2018/2497471 Text en Copyright © 2018 Aytuğ Onan. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Onan, Aytuğ
Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling
title Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling
title_full Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling
title_fullStr Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling
title_full_unstemmed Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling
title_short Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling
title_sort biomedical text categorization based on ensemble pruning and optimized topic modelling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6081524/
https://www.ncbi.nlm.nih.gov/pubmed/30140300
http://dx.doi.org/10.1155/2018/2497471
work_keys_str_mv AT onanaytug biomedicaltextcategorizationbasedonensemblepruningandoptimizedtopicmodelling