Cargando…

Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach

Knowledge about the software used in scientific investigations is necessary for different reasons, including provenance of the results, measuring software impact to attribute developers, and bibliometric software citation analysis in general. Additionally, providing information about whether and how...

Descripción completa

Detalles Bibliográficos
Autores principales: Schindler, David, Zapilko, Benjamin, Krüger, Frank
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250610/
http://dx.doi.org/10.1007/978-3-030-49461-2_16
_version_ 1783538795880120320
author Schindler, David
Zapilko, Benjamin
Krüger, Frank
author_facet Schindler, David
Zapilko, Benjamin
Krüger, Frank
author_sort Schindler, David
collection PubMed
description Knowledge about the software used in scientific investigations is necessary for different reasons, including provenance of the results, measuring software impact to attribute developers, and bibliometric software citation analysis in general. Additionally, providing information about whether and how the software and the source code are available allows an assessment about the state and role of open source software in science in general. While such analyses can be done manually, large scale analyses require the application of automated methods of information extraction and linking. In this paper, we present SoftwareKG—a knowledge graph that contains information about software mentions from more than 51,000 scientific articles from the social sciences. A silver standard corpus, created by a distant and weak supervision approach, and a gold standard corpus, created by manual annotation, were used to train an LSTM based neural network to identify software mentions in scientific articles. The model achieves a recognition rate of .82 F-score in exact matches. As a result, we identified more than 133,000 software mentions. For entity disambiguation, we used the public domain knowledge base DBpedia. Furthermore, we linked the entities of the knowledge graph to other knowledge bases such as the Microsoft Academic Knowledge Graph, the Software Ontology, and Wikidata. Finally, we illustrate, how SoftwareKG can be used to assess the role of software in the social sciences.
format Online
Article
Text
id pubmed-7250610
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72506102020-05-27 Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach Schindler, David Zapilko, Benjamin Krüger, Frank The Semantic Web Article Knowledge about the software used in scientific investigations is necessary for different reasons, including provenance of the results, measuring software impact to attribute developers, and bibliometric software citation analysis in general. Additionally, providing information about whether and how the software and the source code are available allows an assessment about the state and role of open source software in science in general. While such analyses can be done manually, large scale analyses require the application of automated methods of information extraction and linking. In this paper, we present SoftwareKG—a knowledge graph that contains information about software mentions from more than 51,000 scientific articles from the social sciences. A silver standard corpus, created by a distant and weak supervision approach, and a gold standard corpus, created by manual annotation, were used to train an LSTM based neural network to identify software mentions in scientific articles. The model achieves a recognition rate of .82 F-score in exact matches. As a result, we identified more than 133,000 software mentions. For entity disambiguation, we used the public domain knowledge base DBpedia. Furthermore, we linked the entities of the knowledge graph to other knowledge bases such as the Microsoft Academic Knowledge Graph, the Software Ontology, and Wikidata. Finally, we illustrate, how SoftwareKG can be used to assess the role of software in the social sciences. 2020-05-07 /pmc/articles/PMC7250610/ http://dx.doi.org/10.1007/978-3-030-49461-2_16 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Schindler, David
Zapilko, Benjamin
Krüger, Frank
Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach
title Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach
title_full Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach
title_fullStr Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach
title_full_unstemmed Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach
title_short Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach
title_sort investigating software usage in the social sciences: a knowledge graph approach
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250610/
http://dx.doi.org/10.1007/978-3-030-49461-2_16
work_keys_str_mv AT schindlerdavid investigatingsoftwareusageinthesocialsciencesaknowledgegraphapproach
AT zapilkobenjamin investigatingsoftwareusageinthesocialsciencesaknowledgegraphapproach
AT krugerfrank investigatingsoftwareusageinthesocialsciencesaknowledgegraphapproach