Cargando…

DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields

The rapid development of information technology has made the amount of information in massive texts far exceed human intuitive cognition, and dependency parsing can effectively deal with information overload. In the background of domain specialization, the migration and application of syntactic tree...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Rui, Shu, Shili, Wang, Shunli, Liu, Yang, Li, Yanhao, Peng, Mingjun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10606639/
https://www.ncbi.nlm.nih.gov/pubmed/37895565
http://dx.doi.org/10.3390/e25101444
_version_ 1785127364218322944
author Li, Rui
Shu, Shili
Wang, Shunli
Liu, Yang
Li, Yanhao
Peng, Mingjun
author_facet Li, Rui
Shu, Shili
Wang, Shunli
Liu, Yang
Li, Yanhao
Peng, Mingjun
author_sort Li, Rui
collection PubMed
description The rapid development of information technology has made the amount of information in massive texts far exceed human intuitive cognition, and dependency parsing can effectively deal with information overload. In the background of domain specialization, the migration and application of syntactic treebanks and the speed improvement in syntactic analysis models become the key to the efficiency of syntactic analysis. To realize domain migration of syntactic tree library and improve the speed of text parsing, this paper proposes a novel approach—the Double-Array Trie and Multi-threading (DAT-MT) accelerated graph fusion dependency parsing model. It effectively combines the specialized syntactic features from small-scale professional field corpus with the generalized syntactic features from large-scale news corpus, which improves the accuracy of syntactic relation recognition. Aiming at the problem of high space and time complexity brought by the graph fusion model, the DAT-MT method is proposed. It realizes the rapid mapping of massive Chinese character features to the model’s prior parameters and the parallel processing of calculation, thereby improving the parsing speed. The experimental results show that the unlabeled attachment score (UAS) and the labeled attachment score (LAS) of the model are improved by 13.34% and 14.82% compared with the model with only the professional field corpus and improved by 3.14% and 3.40% compared with the model only with news corpus; both indicators are better than DDParser and LTP 4 methods based on deep learning. Additionally, the method in this paper achieves a speedup of about 3.7 times compared to the method with a red-black tree index and a single thread. Efficient and accurate syntactic analysis methods will benefit the real-time processing of massive texts in professional fields, such as multi-dimensional semantic correlation, professional feature extraction, and domain knowledge graph construction.
format Online
Article
Text
id pubmed-10606639
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106066392023-10-28 DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields Li, Rui Shu, Shili Wang, Shunli Liu, Yang Li, Yanhao Peng, Mingjun Entropy (Basel) Article The rapid development of information technology has made the amount of information in massive texts far exceed human intuitive cognition, and dependency parsing can effectively deal with information overload. In the background of domain specialization, the migration and application of syntactic treebanks and the speed improvement in syntactic analysis models become the key to the efficiency of syntactic analysis. To realize domain migration of syntactic tree library and improve the speed of text parsing, this paper proposes a novel approach—the Double-Array Trie and Multi-threading (DAT-MT) accelerated graph fusion dependency parsing model. It effectively combines the specialized syntactic features from small-scale professional field corpus with the generalized syntactic features from large-scale news corpus, which improves the accuracy of syntactic relation recognition. Aiming at the problem of high space and time complexity brought by the graph fusion model, the DAT-MT method is proposed. It realizes the rapid mapping of massive Chinese character features to the model’s prior parameters and the parallel processing of calculation, thereby improving the parsing speed. The experimental results show that the unlabeled attachment score (UAS) and the labeled attachment score (LAS) of the model are improved by 13.34% and 14.82% compared with the model with only the professional field corpus and improved by 3.14% and 3.40% compared with the model only with news corpus; both indicators are better than DDParser and LTP 4 methods based on deep learning. Additionally, the method in this paper achieves a speedup of about 3.7 times compared to the method with a red-black tree index and a single thread. Efficient and accurate syntactic analysis methods will benefit the real-time processing of massive texts in professional fields, such as multi-dimensional semantic correlation, professional feature extraction, and domain knowledge graph construction. MDPI 2023-10-12 /pmc/articles/PMC10606639/ /pubmed/37895565 http://dx.doi.org/10.3390/e25101444 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Li, Rui
Shu, Shili
Wang, Shunli
Liu, Yang
Li, Yanhao
Peng, Mingjun
DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields
title DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields
title_full DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields
title_fullStr DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields
title_full_unstemmed DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields
title_short DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields
title_sort dat-mt accelerated graph fusion dependency parsing model for small samples in professional fields
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10606639/
https://www.ncbi.nlm.nih.gov/pubmed/37895565
http://dx.doi.org/10.3390/e25101444
work_keys_str_mv AT lirui datmtacceleratedgraphfusiondependencyparsingmodelforsmallsamplesinprofessionalfields
AT shushili datmtacceleratedgraphfusiondependencyparsingmodelforsmallsamplesinprofessionalfields
AT wangshunli datmtacceleratedgraphfusiondependencyparsingmodelforsmallsamplesinprofessionalfields
AT liuyang datmtacceleratedgraphfusiondependencyparsingmodelforsmallsamplesinprofessionalfields
AT liyanhao datmtacceleratedgraphfusiondependencyparsingmodelforsmallsamplesinprofessionalfields
AT pengmingjun datmtacceleratedgraphfusiondependencyparsingmodelforsmallsamplesinprofessionalfields