Cargando…
Paragraph-level attention based deep model for chapter segmentation
Books are usually divided into chapters and sections. Correctly and automatically recognizing chapter boundaries can work as a proxy when segmenting long texts (a more general task). Book chapters can be easily segmented by humans, but automatic segregation is more challenging because the data is se...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202623/ https://www.ncbi.nlm.nih.gov/pubmed/35721402 http://dx.doi.org/10.7717/peerj-cs.1003 |
_version_ | 1784728569569607680 |
---|---|
author | Virameteekul, Paveen |
author_facet | Virameteekul, Paveen |
author_sort | Virameteekul, Paveen |
collection | PubMed |
description | Books are usually divided into chapters and sections. Correctly and automatically recognizing chapter boundaries can work as a proxy when segmenting long texts (a more general task). Book chapters can be easily segmented by humans, but automatic segregation is more challenging because the data is semi-structured. Since the concept of language is prone to ambiguity, it is essential to identify the relationship between the words in each paragraph and classify each consecutive paragraph based on their respective relationships with one another. Although researchers have designed deep learning-based models to solve this problem, these approaches have not considered the paragraph-level semantics among the consecutive paragraphs. In this article, we propose a novel deep learning-based method to segment book chapters that uses paragraph-level semantics and an attention mechanism. We first utilized a pre-trained XLNet model connected to a convolutional neural network (CNN) to extract the semantic meaning of each paragraph. Then, we measured the similarities in the semantics of each paragraph and designed an attention mechanism to inject the similarity information in order to better predict the chapter boundaries. The experimental results indicated that the performance of our proposed method can surpass those of other state-of-the-art (SOTA) methods for chapter segmentation on public datasets (the proposed model achieved an F1 score of 0.8856, outperforming the Bidirectional Encoder Representations from Transformers (BERT) model’s F1 score of 0.6640). The ablation study also illustrated that the paragraph-level attention mechanism could produce a significant increase in performance. |
format | Online Article Text |
id | pubmed-9202623 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-92026232022-06-17 Paragraph-level attention based deep model for chapter segmentation Virameteekul, Paveen PeerJ Comput Sci Artificial Intelligence Books are usually divided into chapters and sections. Correctly and automatically recognizing chapter boundaries can work as a proxy when segmenting long texts (a more general task). Book chapters can be easily segmented by humans, but automatic segregation is more challenging because the data is semi-structured. Since the concept of language is prone to ambiguity, it is essential to identify the relationship between the words in each paragraph and classify each consecutive paragraph based on their respective relationships with one another. Although researchers have designed deep learning-based models to solve this problem, these approaches have not considered the paragraph-level semantics among the consecutive paragraphs. In this article, we propose a novel deep learning-based method to segment book chapters that uses paragraph-level semantics and an attention mechanism. We first utilized a pre-trained XLNet model connected to a convolutional neural network (CNN) to extract the semantic meaning of each paragraph. Then, we measured the similarities in the semantics of each paragraph and designed an attention mechanism to inject the similarity information in order to better predict the chapter boundaries. The experimental results indicated that the performance of our proposed method can surpass those of other state-of-the-art (SOTA) methods for chapter segmentation on public datasets (the proposed model achieved an F1 score of 0.8856, outperforming the Bidirectional Encoder Representations from Transformers (BERT) model’s F1 score of 0.6640). The ablation study also illustrated that the paragraph-level attention mechanism could produce a significant increase in performance. PeerJ Inc. 2022-06-10 /pmc/articles/PMC9202623/ /pubmed/35721402 http://dx.doi.org/10.7717/peerj-cs.1003 Text en © 2022 Virameteekul https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Artificial Intelligence Virameteekul, Paveen Paragraph-level attention based deep model for chapter segmentation |
title | Paragraph-level attention based deep model for chapter segmentation |
title_full | Paragraph-level attention based deep model for chapter segmentation |
title_fullStr | Paragraph-level attention based deep model for chapter segmentation |
title_full_unstemmed | Paragraph-level attention based deep model for chapter segmentation |
title_short | Paragraph-level attention based deep model for chapter segmentation |
title_sort | paragraph-level attention based deep model for chapter segmentation |
topic | Artificial Intelligence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202623/ https://www.ncbi.nlm.nih.gov/pubmed/35721402 http://dx.doi.org/10.7717/peerj-cs.1003 |
work_keys_str_mv | AT virameteekulpaveen paragraphlevelattentionbaseddeepmodelforchaptersegmentation |