Cargando…
Chinese technical terminology extraction based on DC-value and information entropy
China's technology is developing rapidly, and the number of patent applications has surged. Therefore, there is an urgent need for technical managers and researchers that how to apply computer technology to conduct in-depth mining and analysis of lots of Chinese patent documents to efficiently...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9681760/ https://www.ncbi.nlm.nih.gov/pubmed/36414635 http://dx.doi.org/10.1038/s41598-022-23209-6 |
_version_ | 1784834694019284992 |
---|---|
author | Liwei, Zhang |
author_facet | Liwei, Zhang |
author_sort | Liwei, Zhang |
collection | PubMed |
description | China's technology is developing rapidly, and the number of patent applications has surged. Therefore, there is an urgent need for technical managers and researchers that how to apply computer technology to conduct in-depth mining and analysis of lots of Chinese patent documents to efficiently use patent information, perform technological innovation and avoid R&D risks. Automatic term extraction is the basis of patent mining and analysis, but many existing approaches focus on extracting domain terms in English, which are difficult to extend to Chinese due to the distinctions between Chinese and English languages. At the same time, some common Chinese technical terminology extraction methods focus on the high-frequency characteristics, while technical domain correlation characteristic and the unithood feature of terminology are given less attention. Aiming at these problems, this paper proposes a Chinese technical terminology method based on DC-value and information entropy to achieve automatic extraction of technical terminology in Chinese patents. The empirical results show that the presented algorithm can effectively extract the technical terminology in Chinese patent literatures and has a better performance than the C-value method, the log-likelihood ratio method and the mutual information method, which has theoretical significance and practical application value. |
format | Online Article Text |
id | pubmed-9681760 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-96817602022-11-24 Chinese technical terminology extraction based on DC-value and information entropy Liwei, Zhang Sci Rep Article China's technology is developing rapidly, and the number of patent applications has surged. Therefore, there is an urgent need for technical managers and researchers that how to apply computer technology to conduct in-depth mining and analysis of lots of Chinese patent documents to efficiently use patent information, perform technological innovation and avoid R&D risks. Automatic term extraction is the basis of patent mining and analysis, but many existing approaches focus on extracting domain terms in English, which are difficult to extend to Chinese due to the distinctions between Chinese and English languages. At the same time, some common Chinese technical terminology extraction methods focus on the high-frequency characteristics, while technical domain correlation characteristic and the unithood feature of terminology are given less attention. Aiming at these problems, this paper proposes a Chinese technical terminology method based on DC-value and information entropy to achieve automatic extraction of technical terminology in Chinese patents. The empirical results show that the presented algorithm can effectively extract the technical terminology in Chinese patent literatures and has a better performance than the C-value method, the log-likelihood ratio method and the mutual information method, which has theoretical significance and practical application value. Nature Publishing Group UK 2022-11-21 /pmc/articles/PMC9681760/ /pubmed/36414635 http://dx.doi.org/10.1038/s41598-022-23209-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Liwei, Zhang Chinese technical terminology extraction based on DC-value and information entropy |
title | Chinese technical terminology extraction based on DC-value and information entropy |
title_full | Chinese technical terminology extraction based on DC-value and information entropy |
title_fullStr | Chinese technical terminology extraction based on DC-value and information entropy |
title_full_unstemmed | Chinese technical terminology extraction based on DC-value and information entropy |
title_short | Chinese technical terminology extraction based on DC-value and information entropy |
title_sort | chinese technical terminology extraction based on dc-value and information entropy |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9681760/ https://www.ncbi.nlm.nih.gov/pubmed/36414635 http://dx.doi.org/10.1038/s41598-022-23209-6 |
work_keys_str_mv | AT liweizhang chinesetechnicalterminologyextractionbasedondcvalueandinformationentropy |