Cargando…

A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning

Text information mining is a key step to data-driven automatic/semi-automatic quality management (QM). For Chinese texts, a word segmentation algorithm is necessary for pre-processing since there are no explicit marks to define word boundaries. Because of intrinsic characteristics of QM-related text...

Descripción completa

Detalles Bibliográficos
Autores principales: Wen, Peihan, Feng, Linhan, Zhang, Tian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9543942/
https://www.ncbi.nlm.nih.gov/pubmed/36206249
http://dx.doi.org/10.1371/journal.pone.0270154
_version_ 1784804489064087552
author Wen, Peihan
Feng, Linhan
Zhang, Tian
author_facet Wen, Peihan
Feng, Linhan
Zhang, Tian
author_sort Wen, Peihan
collection PubMed
description Text information mining is a key step to data-driven automatic/semi-automatic quality management (QM). For Chinese texts, a word segmentation algorithm is necessary for pre-processing since there are no explicit marks to define word boundaries. Because of intrinsic characteristics of QM-related texts, word segmentation algorithms for normal Chinese texts cannot be directly applied. Hence, based on the analysis of QM-related texts, we summarized six features, and proposed a hybrid Chinese word segmentation model by means of integrating transfer learning (TL), bidirectional long-short term memory (Bi-LSTM), multi-head attention (MA), and conditional random field (CRF) to construct the mTL-Bi-LSTM-MA-CRF model, considering insufficient samples of QM-related texts and excessive cutting of idioms. The mTL-Bi-LSTM-MA-CRF model is composed of two steps. Firstly, based on a word embedding space, the Bi-LSTM is introduced for context information learning, and the MA mechanism is selected to allocate attention among subspaces, and then the CRF is used to learn label sequence constraints. Secondly, a modified TL method is put forward for text feature extraction, adaptive layer weights learning, and loss function correction for selective learning. Experimental results show that the proposed model can achieve good word segmentation results with only a relatively small set of samples.
format Online
Article
Text
id pubmed-9543942
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-95439422022-10-08 A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning Wen, Peihan Feng, Linhan Zhang, Tian PLoS One Research Article Text information mining is a key step to data-driven automatic/semi-automatic quality management (QM). For Chinese texts, a word segmentation algorithm is necessary for pre-processing since there are no explicit marks to define word boundaries. Because of intrinsic characteristics of QM-related texts, word segmentation algorithms for normal Chinese texts cannot be directly applied. Hence, based on the analysis of QM-related texts, we summarized six features, and proposed a hybrid Chinese word segmentation model by means of integrating transfer learning (TL), bidirectional long-short term memory (Bi-LSTM), multi-head attention (MA), and conditional random field (CRF) to construct the mTL-Bi-LSTM-MA-CRF model, considering insufficient samples of QM-related texts and excessive cutting of idioms. The mTL-Bi-LSTM-MA-CRF model is composed of two steps. Firstly, based on a word embedding space, the Bi-LSTM is introduced for context information learning, and the MA mechanism is selected to allocate attention among subspaces, and then the CRF is used to learn label sequence constraints. Secondly, a modified TL method is put forward for text feature extraction, adaptive layer weights learning, and loss function correction for selective learning. Experimental results show that the proposed model can achieve good word segmentation results with only a relatively small set of samples. Public Library of Science 2022-10-07 /pmc/articles/PMC9543942/ /pubmed/36206249 http://dx.doi.org/10.1371/journal.pone.0270154 Text en © 2022 Wen et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wen, Peihan
Feng, Linhan
Zhang, Tian
A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning
title A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning
title_full A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning
title_fullStr A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning
title_full_unstemmed A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning
title_short A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning
title_sort hybrid chinese word segmentation model for quality management-related texts based on transfer learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9543942/
https://www.ncbi.nlm.nih.gov/pubmed/36206249
http://dx.doi.org/10.1371/journal.pone.0270154
work_keys_str_mv AT wenpeihan ahybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning
AT fenglinhan ahybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning
AT zhangtian ahybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning
AT wenpeihan hybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning
AT fenglinhan hybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning
AT zhangtian hybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning