Cargando…

Paragraph-level attention based deep model for chapter segmentation

Books are usually divided into chapters and sections. Correctly and automatically recognizing chapter boundaries can work as a proxy when segmenting long texts (a more general task). Book chapters can be easily segmented by humans, but automatic segregation is more challenging because the data is se...

Descripción completa

Detalles Bibliográficos
Autor principal: Virameteekul, Paveen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202623/
https://www.ncbi.nlm.nih.gov/pubmed/35721402
http://dx.doi.org/10.7717/peerj-cs.1003
_version_ 1784728569569607680
author Virameteekul, Paveen
author_facet Virameteekul, Paveen
author_sort Virameteekul, Paveen
collection PubMed
description Books are usually divided into chapters and sections. Correctly and automatically recognizing chapter boundaries can work as a proxy when segmenting long texts (a more general task). Book chapters can be easily segmented by humans, but automatic segregation is more challenging because the data is semi-structured. Since the concept of language is prone to ambiguity, it is essential to identify the relationship between the words in each paragraph and classify each consecutive paragraph based on their respective relationships with one another. Although researchers have designed deep learning-based models to solve this problem, these approaches have not considered the paragraph-level semantics among the consecutive paragraphs. In this article, we propose a novel deep learning-based method to segment book chapters that uses paragraph-level semantics and an attention mechanism. We first utilized a pre-trained XLNet model connected to a convolutional neural network (CNN) to extract the semantic meaning of each paragraph. Then, we measured the similarities in the semantics of each paragraph and designed an attention mechanism to inject the similarity information in order to better predict the chapter boundaries. The experimental results indicated that the performance of our proposed method can surpass those of other state-of-the-art (SOTA) methods for chapter segmentation on public datasets (the proposed model achieved an F1 score of 0.8856, outperforming the Bidirectional Encoder Representations from Transformers (BERT) model’s F1 score of 0.6640). The ablation study also illustrated that the paragraph-level attention mechanism could produce a significant increase in performance.
format Online
Article
Text
id pubmed-9202623
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-92026232022-06-17 Paragraph-level attention based deep model for chapter segmentation Virameteekul, Paveen PeerJ Comput Sci Artificial Intelligence Books are usually divided into chapters and sections. Correctly and automatically recognizing chapter boundaries can work as a proxy when segmenting long texts (a more general task). Book chapters can be easily segmented by humans, but automatic segregation is more challenging because the data is semi-structured. Since the concept of language is prone to ambiguity, it is essential to identify the relationship between the words in each paragraph and classify each consecutive paragraph based on their respective relationships with one another. Although researchers have designed deep learning-based models to solve this problem, these approaches have not considered the paragraph-level semantics among the consecutive paragraphs. In this article, we propose a novel deep learning-based method to segment book chapters that uses paragraph-level semantics and an attention mechanism. We first utilized a pre-trained XLNet model connected to a convolutional neural network (CNN) to extract the semantic meaning of each paragraph. Then, we measured the similarities in the semantics of each paragraph and designed an attention mechanism to inject the similarity information in order to better predict the chapter boundaries. The experimental results indicated that the performance of our proposed method can surpass those of other state-of-the-art (SOTA) methods for chapter segmentation on public datasets (the proposed model achieved an F1 score of 0.8856, outperforming the Bidirectional Encoder Representations from Transformers (BERT) model’s F1 score of 0.6640). The ablation study also illustrated that the paragraph-level attention mechanism could produce a significant increase in performance. PeerJ Inc. 2022-06-10 /pmc/articles/PMC9202623/ /pubmed/35721402 http://dx.doi.org/10.7717/peerj-cs.1003 Text en © 2022 Virameteekul https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Artificial Intelligence
Virameteekul, Paveen
Paragraph-level attention based deep model for chapter segmentation
title Paragraph-level attention based deep model for chapter segmentation
title_full Paragraph-level attention based deep model for chapter segmentation
title_fullStr Paragraph-level attention based deep model for chapter segmentation
title_full_unstemmed Paragraph-level attention based deep model for chapter segmentation
title_short Paragraph-level attention based deep model for chapter segmentation
title_sort paragraph-level attention based deep model for chapter segmentation
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202623/
https://www.ncbi.nlm.nih.gov/pubmed/35721402
http://dx.doi.org/10.7717/peerj-cs.1003
work_keys_str_mv AT virameteekulpaveen paragraphlevelattentionbaseddeepmodelforchaptersegmentation