Cargando…

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification

Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this paper, we pr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hu, Jie, Li, Shaobo, Yao, Yong, Yu, Liya, Yang, Guanci, Hu, Jianjun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2018
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512597/ https://www.ncbi.nlm.nih.gov/pubmed/33265195 http://dx.doi.org/10.3390/e20020104

_version_	1783586194902220800
author	Hu, Jie Li, Shaobo Yao, Yong Yu, Liya Yang, Guanci Hu, Jianjun
author_facet	Hu, Jie Li, Shaobo Yao, Yong Yu, Liya Yang, Guanci Hu, Jianjun
author_sort	Hu, Jie
collection	PubMed
description	Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this paper, we propose a patent keyword extraction algorithm (PKEA) based on the distributed Skip-gram model for patent classification. We also develop a set of quantitative performance measures for keyword extraction evaluation based on information gain and cross-validation, based on Support Vector Machine (SVM) classification, which are valuable when human-annotated keywords are not available. We used a standard benchmark dataset and a homemade patent dataset to evaluate the performance of PKEA. Our patent dataset includes 2500 patents from five distinct technological fields related to autonomous cars (GPS systems, lidar systems, object recognition systems, radar systems, and vehicle control systems). We compared our method with Frequency, Term Frequency-Inverse Document Frequency (TF-IDF), TextRank and Rapid Automatic Keyword Extraction (RAKE). The experimental results show that our proposed algorithm provides a promising way to extract keywords from patent texts for patent classification.
format	Online Article Text
id	pubmed-7512597
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75125972020-11-09 Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification Hu, Jie Li, Shaobo Yao, Yong Yu, Liya Yang, Guanci Hu, Jianjun Entropy (Basel) Article Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this paper, we propose a patent keyword extraction algorithm (PKEA) based on the distributed Skip-gram model for patent classification. We also develop a set of quantitative performance measures for keyword extraction evaluation based on information gain and cross-validation, based on Support Vector Machine (SVM) classification, which are valuable when human-annotated keywords are not available. We used a standard benchmark dataset and a homemade patent dataset to evaluate the performance of PKEA. Our patent dataset includes 2500 patents from five distinct technological fields related to autonomous cars (GPS systems, lidar systems, object recognition systems, radar systems, and vehicle control systems). We compared our method with Frequency, Term Frequency-Inverse Document Frequency (TF-IDF), TextRank and Rapid Automatic Keyword Extraction (RAKE). The experimental results show that our proposed algorithm provides a promising way to extract keywords from patent texts for patent classification. MDPI 2018-02-02 /pmc/articles/PMC7512597/ /pubmed/33265195 http://dx.doi.org/10.3390/e20020104 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Hu, Jie Li, Shaobo Yao, Yong Yu, Liya Yang, Guanci Hu, Jianjun Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification
title	Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification
title_full	Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification
title_fullStr	Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification
title_full_unstemmed	Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification
title_short	Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification
title_sort	patent keyword extraction algorithm based on distributed representation for patent classification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512597/ https://www.ncbi.nlm.nih.gov/pubmed/33265195 http://dx.doi.org/10.3390/e20020104
work_keys_str_mv	AT hujie patentkeywordextractionalgorithmbasedondistributedrepresentationforpatentclassification AT lishaobo patentkeywordextractionalgorithmbasedondistributedrepresentationforpatentclassification AT yaoyong patentkeywordextractionalgorithmbasedondistributedrepresentationforpatentclassification AT yuliya patentkeywordextractionalgorithmbasedondistributedrepresentationforpatentclassification AT yangguanci patentkeywordextractionalgorithmbasedondistributedrepresentationforpatentclassification AT hujianjun patentkeywordextractionalgorithmbasedondistributedrepresentationforpatentclassification

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification

Ejemplares similares