Cargando…
A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning
Text information mining is a key step to data-driven automatic/semi-automatic quality management (QM). For Chinese texts, a word segmentation algorithm is necessary for pre-processing since there are no explicit marks to define word boundaries. Because of intrinsic characteristics of QM-related text...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9543942/ https://www.ncbi.nlm.nih.gov/pubmed/36206249 http://dx.doi.org/10.1371/journal.pone.0270154 |
_version_ | 1784804489064087552 |
---|---|
author | Wen, Peihan Feng, Linhan Zhang, Tian |
author_facet | Wen, Peihan Feng, Linhan Zhang, Tian |
author_sort | Wen, Peihan |
collection | PubMed |
description | Text information mining is a key step to data-driven automatic/semi-automatic quality management (QM). For Chinese texts, a word segmentation algorithm is necessary for pre-processing since there are no explicit marks to define word boundaries. Because of intrinsic characteristics of QM-related texts, word segmentation algorithms for normal Chinese texts cannot be directly applied. Hence, based on the analysis of QM-related texts, we summarized six features, and proposed a hybrid Chinese word segmentation model by means of integrating transfer learning (TL), bidirectional long-short term memory (Bi-LSTM), multi-head attention (MA), and conditional random field (CRF) to construct the mTL-Bi-LSTM-MA-CRF model, considering insufficient samples of QM-related texts and excessive cutting of idioms. The mTL-Bi-LSTM-MA-CRF model is composed of two steps. Firstly, based on a word embedding space, the Bi-LSTM is introduced for context information learning, and the MA mechanism is selected to allocate attention among subspaces, and then the CRF is used to learn label sequence constraints. Secondly, a modified TL method is put forward for text feature extraction, adaptive layer weights learning, and loss function correction for selective learning. Experimental results show that the proposed model can achieve good word segmentation results with only a relatively small set of samples. |
format | Online Article Text |
id | pubmed-9543942 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-95439422022-10-08 A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning Wen, Peihan Feng, Linhan Zhang, Tian PLoS One Research Article Text information mining is a key step to data-driven automatic/semi-automatic quality management (QM). For Chinese texts, a word segmentation algorithm is necessary for pre-processing since there are no explicit marks to define word boundaries. Because of intrinsic characteristics of QM-related texts, word segmentation algorithms for normal Chinese texts cannot be directly applied. Hence, based on the analysis of QM-related texts, we summarized six features, and proposed a hybrid Chinese word segmentation model by means of integrating transfer learning (TL), bidirectional long-short term memory (Bi-LSTM), multi-head attention (MA), and conditional random field (CRF) to construct the mTL-Bi-LSTM-MA-CRF model, considering insufficient samples of QM-related texts and excessive cutting of idioms. The mTL-Bi-LSTM-MA-CRF model is composed of two steps. Firstly, based on a word embedding space, the Bi-LSTM is introduced for context information learning, and the MA mechanism is selected to allocate attention among subspaces, and then the CRF is used to learn label sequence constraints. Secondly, a modified TL method is put forward for text feature extraction, adaptive layer weights learning, and loss function correction for selective learning. Experimental results show that the proposed model can achieve good word segmentation results with only a relatively small set of samples. Public Library of Science 2022-10-07 /pmc/articles/PMC9543942/ /pubmed/36206249 http://dx.doi.org/10.1371/journal.pone.0270154 Text en © 2022 Wen et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Wen, Peihan Feng, Linhan Zhang, Tian A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning |
title | A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning |
title_full | A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning |
title_fullStr | A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning |
title_full_unstemmed | A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning |
title_short | A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning |
title_sort | hybrid chinese word segmentation model for quality management-related texts based on transfer learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9543942/ https://www.ncbi.nlm.nih.gov/pubmed/36206249 http://dx.doi.org/10.1371/journal.pone.0270154 |
work_keys_str_mv | AT wenpeihan ahybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning AT fenglinhan ahybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning AT zhangtian ahybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning AT wenpeihan hybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning AT fenglinhan hybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning AT zhangtian hybridchinesewordsegmentationmodelforqualitymanagementrelatedtextsbasedontransferlearning |