Cargando…

Chinese technical terminology extraction based on DC-value and information entropy

China's technology is developing rapidly, and the number of patent applications has surged. Therefore, there is an urgent need for technical managers and researchers that how to apply computer technology to conduct in-depth mining and analysis of lots of Chinese patent documents to efficiently...

Descripción completa

Detalles Bibliográficos
Autor principal: Liwei, Zhang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9681760/
https://www.ncbi.nlm.nih.gov/pubmed/36414635
http://dx.doi.org/10.1038/s41598-022-23209-6
_version_ 1784834694019284992
author Liwei, Zhang
author_facet Liwei, Zhang
author_sort Liwei, Zhang
collection PubMed
description China's technology is developing rapidly, and the number of patent applications has surged. Therefore, there is an urgent need for technical managers and researchers that how to apply computer technology to conduct in-depth mining and analysis of lots of Chinese patent documents to efficiently use patent information, perform technological innovation and avoid R&D risks. Automatic term extraction is the basis of patent mining and analysis, but many existing approaches focus on extracting domain terms in English, which are difficult to extend to Chinese due to the distinctions between Chinese and English languages. At the same time, some common Chinese technical terminology extraction methods focus on the high-frequency characteristics, while technical domain correlation characteristic and the unithood feature of terminology are given less attention. Aiming at these problems, this paper proposes a Chinese technical terminology method based on DC-value and information entropy to achieve automatic extraction of technical terminology in Chinese patents. The empirical results show that the presented algorithm can effectively extract the technical terminology in Chinese patent literatures and has a better performance than the C-value method, the log-likelihood ratio method and the mutual information method, which has theoretical significance and practical application value.
format Online
Article
Text
id pubmed-9681760
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-96817602022-11-24 Chinese technical terminology extraction based on DC-value and information entropy Liwei, Zhang Sci Rep Article China's technology is developing rapidly, and the number of patent applications has surged. Therefore, there is an urgent need for technical managers and researchers that how to apply computer technology to conduct in-depth mining and analysis of lots of Chinese patent documents to efficiently use patent information, perform technological innovation and avoid R&D risks. Automatic term extraction is the basis of patent mining and analysis, but many existing approaches focus on extracting domain terms in English, which are difficult to extend to Chinese due to the distinctions between Chinese and English languages. At the same time, some common Chinese technical terminology extraction methods focus on the high-frequency characteristics, while technical domain correlation characteristic and the unithood feature of terminology are given less attention. Aiming at these problems, this paper proposes a Chinese technical terminology method based on DC-value and information entropy to achieve automatic extraction of technical terminology in Chinese patents. The empirical results show that the presented algorithm can effectively extract the technical terminology in Chinese patent literatures and has a better performance than the C-value method, the log-likelihood ratio method and the mutual information method, which has theoretical significance and practical application value. Nature Publishing Group UK 2022-11-21 /pmc/articles/PMC9681760/ /pubmed/36414635 http://dx.doi.org/10.1038/s41598-022-23209-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Liwei, Zhang
Chinese technical terminology extraction based on DC-value and information entropy
title Chinese technical terminology extraction based on DC-value and information entropy
title_full Chinese technical terminology extraction based on DC-value and information entropy
title_fullStr Chinese technical terminology extraction based on DC-value and information entropy
title_full_unstemmed Chinese technical terminology extraction based on DC-value and information entropy
title_short Chinese technical terminology extraction based on DC-value and information entropy
title_sort chinese technical terminology extraction based on dc-value and information entropy
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9681760/
https://www.ncbi.nlm.nih.gov/pubmed/36414635
http://dx.doi.org/10.1038/s41598-022-23209-6
work_keys_str_mv AT liweizhang chinesetechnicalterminologyextractionbasedondcvalueandinformationentropy