Cargando…

Paragraph-level attention based deep model for chapter segmentation

Books are usually divided into chapters and sections. Correctly and automatically recognizing chapter boundaries can work as a proxy when segmenting long texts (a more general task). Book chapters can be easily segmented by humans, but automatic segregation is more challenging because the data is se...

Descripción completa

Detalles Bibliográficos
Autor principal:	Virameteekul, Paveen
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2022
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202623/ https://www.ncbi.nlm.nih.gov/pubmed/35721402 http://dx.doi.org/10.7717/peerj-cs.1003

_version_	1784728569569607680
author	Virameteekul, Paveen
author_facet	Virameteekul, Paveen
author_sort	Virameteekul, Paveen
collection	PubMed
description	Books are usually divided into chapters and sections. Correctly and automatically recognizing chapter boundaries can work as a proxy when segmenting long texts (a more general task). Book chapters can be easily segmented by humans, but automatic segregation is more challenging because the data is semi-structured. Since the concept of language is prone to ambiguity, it is essential to identify the relationship between the words in each paragraph and classify each consecutive paragraph based on their respective relationships with one another. Although researchers have designed deep learning-based models to solve this problem, these approaches have not considered the paragraph-level semantics among the consecutive paragraphs. In this article, we propose a novel deep learning-based method to segment book chapters that uses paragraph-level semantics and an attention mechanism. We first utilized a pre-trained XLNet model connected to a convolutional neural network (CNN) to extract the semantic meaning of each paragraph. Then, we measured the similarities in the semantics of each paragraph and designed an attention mechanism to inject the similarity information in order to better predict the chapter boundaries. The experimental results indicated that the performance of our proposed method can surpass those of other state-of-the-art (SOTA) methods for chapter segmentation on public datasets (the proposed model achieved an F1 score of 0.8856, outperforming the Bidirectional Encoder Representations from Transformers (BERT) model’s F1 score of 0.6640). The ablation study also illustrated that the paragraph-level attention mechanism could produce a significant increase in performance.
format	Online Article Text
id	pubmed-9202623
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-92026232022-06-17 Paragraph-level attention based deep model for chapter segmentation Virameteekul, Paveen PeerJ Comput Sci Artificial Intelligence Books are usually divided into chapters and sections. Correctly and automatically recognizing chapter boundaries can work as a proxy when segmenting long texts (a more general task). Book chapters can be easily segmented by humans, but automatic segregation is more challenging because the data is semi-structured. Since the concept of language is prone to ambiguity, it is essential to identify the relationship between the words in each paragraph and classify each consecutive paragraph based on their respective relationships with one another. Although researchers have designed deep learning-based models to solve this problem, these approaches have not considered the paragraph-level semantics among the consecutive paragraphs. In this article, we propose a novel deep learning-based method to segment book chapters that uses paragraph-level semantics and an attention mechanism. We first utilized a pre-trained XLNet model connected to a convolutional neural network (CNN) to extract the semantic meaning of each paragraph. Then, we measured the similarities in the semantics of each paragraph and designed an attention mechanism to inject the similarity information in order to better predict the chapter boundaries. The experimental results indicated that the performance of our proposed method can surpass those of other state-of-the-art (SOTA) methods for chapter segmentation on public datasets (the proposed model achieved an F1 score of 0.8856, outperforming the Bidirectional Encoder Representations from Transformers (BERT) model’s F1 score of 0.6640). The ablation study also illustrated that the paragraph-level attention mechanism could produce a significant increase in performance. PeerJ Inc. 2022-06-10 /pmc/articles/PMC9202623/ /pubmed/35721402 http://dx.doi.org/10.7717/peerj-cs.1003 Text en © 2022 Virameteekul https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Artificial Intelligence Virameteekul, Paveen Paragraph-level attention based deep model for chapter segmentation
title	Paragraph-level attention based deep model for chapter segmentation
title_full	Paragraph-level attention based deep model for chapter segmentation
title_fullStr	Paragraph-level attention based deep model for chapter segmentation
title_full_unstemmed	Paragraph-level attention based deep model for chapter segmentation
title_short	Paragraph-level attention based deep model for chapter segmentation
title_sort	paragraph-level attention based deep model for chapter segmentation
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202623/ https://www.ncbi.nlm.nih.gov/pubmed/35721402 http://dx.doi.org/10.7717/peerj-cs.1003
work_keys_str_mv	AT virameteekulpaveen paragraphlevelattentionbaseddeepmodelforchaptersegmentation

Paragraph-level attention based deep model for chapter segmentation

Ejemplares similares