Cargando…

Conserving Semantic Unit Information and Simplifying Syntactic Constituents to Improve Implicit Discourse Relation Recognition

Implicit discourse relation recognition (IDRR) has long been considered a challenging problem in shallow discourse parsing. The absence of connectives makes such relations implicit and requires much more effort to understand the semantics of the text. Thus, it is important to preserve the semantic c...

Descripción completa

Detalles Bibliográficos
Autores principales: Fang, Zhongyang, Cong, Yue, Chai, Yuhan, Gao, Chengliang, Chen, Ximing, Qiu, Jing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10529030/
https://www.ncbi.nlm.nih.gov/pubmed/37761593
http://dx.doi.org/10.3390/e25091294
_version_ 1785111339586289664
author Fang, Zhongyang
Cong, Yue
Chai, Yuhan
Gao, Chengliang
Chen, Ximing
Qiu, Jing
author_facet Fang, Zhongyang
Cong, Yue
Chai, Yuhan
Gao, Chengliang
Chen, Ximing
Qiu, Jing
author_sort Fang, Zhongyang
collection PubMed
description Implicit discourse relation recognition (IDRR) has long been considered a challenging problem in shallow discourse parsing. The absence of connectives makes such relations implicit and requires much more effort to understand the semantics of the text. Thus, it is important to preserve the semantic completeness before any attempt to predict the discourse relation. However, word level embedding, widely used in existing works, may lead to a loss of semantics by splitting some phrases that should be treated as complete semantic units. In this article, we proposed three methods to segment a sentence into complete semantic units: a corpus-based method to serve as the baseline, a constituent parsing tree-based method, and a dependency parsing tree-based method to provide a more flexible and automatic way to divide the sentence. The segmented sentence will then be embedded at the level of semantic units so the embeddings could be fed into the IDRR networks and play the same role as word embeddings. We implemented our methods into one of the recent IDRR models to compare the performance with the original version using word level embeddings. Results show that proper embedding level better conserves the semantic information in the sentence and helps to enhance the performance of IDRR models.
format Online
Article
Text
id pubmed-10529030
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-105290302023-09-28 Conserving Semantic Unit Information and Simplifying Syntactic Constituents to Improve Implicit Discourse Relation Recognition Fang, Zhongyang Cong, Yue Chai, Yuhan Gao, Chengliang Chen, Ximing Qiu, Jing Entropy (Basel) Article Implicit discourse relation recognition (IDRR) has long been considered a challenging problem in shallow discourse parsing. The absence of connectives makes such relations implicit and requires much more effort to understand the semantics of the text. Thus, it is important to preserve the semantic completeness before any attempt to predict the discourse relation. However, word level embedding, widely used in existing works, may lead to a loss of semantics by splitting some phrases that should be treated as complete semantic units. In this article, we proposed three methods to segment a sentence into complete semantic units: a corpus-based method to serve as the baseline, a constituent parsing tree-based method, and a dependency parsing tree-based method to provide a more flexible and automatic way to divide the sentence. The segmented sentence will then be embedded at the level of semantic units so the embeddings could be fed into the IDRR networks and play the same role as word embeddings. We implemented our methods into one of the recent IDRR models to compare the performance with the original version using word level embeddings. Results show that proper embedding level better conserves the semantic information in the sentence and helps to enhance the performance of IDRR models. MDPI 2023-09-04 /pmc/articles/PMC10529030/ /pubmed/37761593 http://dx.doi.org/10.3390/e25091294 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Fang, Zhongyang
Cong, Yue
Chai, Yuhan
Gao, Chengliang
Chen, Ximing
Qiu, Jing
Conserving Semantic Unit Information and Simplifying Syntactic Constituents to Improve Implicit Discourse Relation Recognition
title Conserving Semantic Unit Information and Simplifying Syntactic Constituents to Improve Implicit Discourse Relation Recognition
title_full Conserving Semantic Unit Information and Simplifying Syntactic Constituents to Improve Implicit Discourse Relation Recognition
title_fullStr Conserving Semantic Unit Information and Simplifying Syntactic Constituents to Improve Implicit Discourse Relation Recognition
title_full_unstemmed Conserving Semantic Unit Information and Simplifying Syntactic Constituents to Improve Implicit Discourse Relation Recognition
title_short Conserving Semantic Unit Information and Simplifying Syntactic Constituents to Improve Implicit Discourse Relation Recognition
title_sort conserving semantic unit information and simplifying syntactic constituents to improve implicit discourse relation recognition
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10529030/
https://www.ncbi.nlm.nih.gov/pubmed/37761593
http://dx.doi.org/10.3390/e25091294
work_keys_str_mv AT fangzhongyang conservingsemanticunitinformationandsimplifyingsyntacticconstituentstoimproveimplicitdiscourserelationrecognition
AT congyue conservingsemanticunitinformationandsimplifyingsyntacticconstituentstoimproveimplicitdiscourserelationrecognition
AT chaiyuhan conservingsemanticunitinformationandsimplifyingsyntacticconstituentstoimproveimplicitdiscourserelationrecognition
AT gaochengliang conservingsemanticunitinformationandsimplifyingsyntacticconstituentstoimproveimplicitdiscourserelationrecognition
AT chenximing conservingsemanticunitinformationandsimplifyingsyntacticconstituentstoimproveimplicitdiscourserelationrecognition
AT qiujing conservingsemanticunitinformationandsimplifyingsyntacticconstituentstoimproveimplicitdiscourserelationrecognition