Cargando…
Single document text summarization addressed with a cat swarm optimization approach
The availability of a tremendous amount of online information bringing about a broad interest in extracting relevant information in a compact and meaningful way, prompted the need for automatic text summarization. Hence, in the proposed system, the automated text summarization has been considered as...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9510417/ https://www.ncbi.nlm.nih.gov/pubmed/36187330 http://dx.doi.org/10.1007/s10489-022-04149-0 |
Sumario: | The availability of a tremendous amount of online information bringing about a broad interest in extracting relevant information in a compact and meaningful way, prompted the need for automatic text summarization. Hence, in the proposed system, the automated text summarization has been considered as an extractive single-document summarization problem, and a Cat Swarm Optimization (CSO) algorithm-based approach is proposed to solve it, whose objective is to generate good summaries in terms of content coverage, informative, anti-redundancy, and readability. In this work, input documents are pre-processed first. Then the cat population is initialized, where each individual (cat) in a binary vector is randomly initialized in the search space, considering the constraint. The objective function is then formulated considering different sentence quality measures. The Best Cat Memory Pool (BCMP) is initialized based on the objective function score. After that, individuals are randomly distributed for position updating to perform seeking/tracing mode operations based on the mixture ratio in each iteration. BCMP is also updated accordingly. Finally, an optimal individual is chosen to generate the summary after the last iteration. DUC-2001 and DUC-2002 data sets and ROUGE measures are used for system evaluation, and the obtained results are compared with the various state-of-the-art methods. We have achieved approximately 25% and 5% improvement on ROUGE-1 and ROUGE-2 scores on the datasets over the best existing method mentioned in this paper, revealing the proposed method’s superiority. The proposed system is also evaluated considering the generational distance, CPU processing time, cohesion, and readability factor, reflecting that the system-generated summaries are readable, concise, relevant, and fast. We have also conducted a two-sample t-test, and one-way ANOVA test showing the proposed approach is statistically significant. |
---|