Cargando…

Imputation of missing information in worldwide patent data

We present a general method for imputing missing information in the Worldwide Patent Statistical Database (PATSTAT) and make the resulting datasets publicly available. The PATSTAT database is the de facto standard for academic research using patent data. Complete information on patents is essential...

Descripción completa

Detalles Bibliográficos
Autores principales: de Rassenfosse, Gaétan, Seliger, Florian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7744924/
https://www.ncbi.nlm.nih.gov/pubmed/33354599
http://dx.doi.org/10.1016/j.dib.2020.106615
_version_ 1783624512845119488
author de Rassenfosse, Gaétan
Seliger, Florian
author_facet de Rassenfosse, Gaétan
Seliger, Florian
author_sort de Rassenfosse, Gaétan
collection PubMed
description We present a general method for imputing missing information in the Worldwide Patent Statistical Database (PATSTAT) and make the resulting datasets publicly available. The PATSTAT database is the de facto standard for academic research using patent data. Complete information on patents is essential to obtain an accurate picture of technological activities across countries and over time. However, the coverage of the database is far from complete. Our data imputation method exploits detailed institutional knowledge about the international patent system, and we codify it in a SQL algorithm. We provide two datasets related to the imputation of missing country codes and missing technology classification. We also release the algorithm that can be easily adapted to impute other pieces of information that are missing in PATSTAT.
format Online
Article
Text
id pubmed-7744924
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-77449242020-12-21 Imputation of missing information in worldwide patent data de Rassenfosse, Gaétan Seliger, Florian Data Brief Data Article We present a general method for imputing missing information in the Worldwide Patent Statistical Database (PATSTAT) and make the resulting datasets publicly available. The PATSTAT database is the de facto standard for academic research using patent data. Complete information on patents is essential to obtain an accurate picture of technological activities across countries and over time. However, the coverage of the database is far from complete. Our data imputation method exploits detailed institutional knowledge about the international patent system, and we codify it in a SQL algorithm. We provide two datasets related to the imputation of missing country codes and missing technology classification. We also release the algorithm that can be easily adapted to impute other pieces of information that are missing in PATSTAT. Elsevier 2020-12-05 /pmc/articles/PMC7744924/ /pubmed/33354599 http://dx.doi.org/10.1016/j.dib.2020.106615 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
de Rassenfosse, Gaétan
Seliger, Florian
Imputation of missing information in worldwide patent data
title Imputation of missing information in worldwide patent data
title_full Imputation of missing information in worldwide patent data
title_fullStr Imputation of missing information in worldwide patent data
title_full_unstemmed Imputation of missing information in worldwide patent data
title_short Imputation of missing information in worldwide patent data
title_sort imputation of missing information in worldwide patent data
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7744924/
https://www.ncbi.nlm.nih.gov/pubmed/33354599
http://dx.doi.org/10.1016/j.dib.2020.106615
work_keys_str_mv AT derassenfossegaetan imputationofmissinginformationinworldwidepatentdata
AT seligerflorian imputationofmissinginformationinworldwidepatentdata