Cargando…

The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows

Since its inception in 2007, DBpedia has been constantly releasing open data in RDF, extracted from various Wikimedia projects using a complex software system called the DBpedia Information Extraction Framework (DIEF). For the past 12 years, the software received a plethora of extensions by the comm...

Descripción completa

Detalles Bibliográficos
Autores principales: Hofer, Marvin, Hellmann, Sebastian, Dojchinovski, Milan, Frey, Johannes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7586439/
http://dx.doi.org/10.1007/978-3-030-59833-4_1
_version_ 1783599997792550912
author Hofer, Marvin
Hellmann, Sebastian
Dojchinovski, Milan
Frey, Johannes
author_facet Hofer, Marvin
Hellmann, Sebastian
Dojchinovski, Milan
Frey, Johannes
author_sort Hofer, Marvin
collection PubMed
description Since its inception in 2007, DBpedia has been constantly releasing open data in RDF, extracted from various Wikimedia projects using a complex software system called the DBpedia Information Extraction Framework (DIEF). For the past 12 years, the software received a plethora of extensions by the community, which positively affected the size and data quality. Due to the increase in size and complexity, the release process was facing huge delays (from 12 to 17 months cycle), thus impacting the agility of the development. In this paper, we describe the new DBpedia release cycle including our innovative release workflow, which allows development teams (in particular those who publish large, open data) to implement agile, cost-efficient processes and scale up productivity. The DBpedia release workflow has been re-engineered, its new primary focus is on productivity and agility, to address the challenges of size and complexity. At the same time, quality is assured by implementing a comprehensive testing methodology. We run an experimental evaluation and argue that the implemented measures increase agility and allow for cost-effective quality-control and debugging and thus achieve a higher level of maintainability. As a result, DBpedia now publishes regular (i.e. monthly) releases with over 21 billion triples with minimal publishing effort .
format Online
Article
Text
id pubmed-7586439
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-75864392020-10-27 The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows Hofer, Marvin Hellmann, Sebastian Dojchinovski, Milan Frey, Johannes Semantic Systems. In the Era of Knowledge Graphs Article Since its inception in 2007, DBpedia has been constantly releasing open data in RDF, extracted from various Wikimedia projects using a complex software system called the DBpedia Information Extraction Framework (DIEF). For the past 12 years, the software received a plethora of extensions by the community, which positively affected the size and data quality. Due to the increase in size and complexity, the release process was facing huge delays (from 12 to 17 months cycle), thus impacting the agility of the development. In this paper, we describe the new DBpedia release cycle including our innovative release workflow, which allows development teams (in particular those who publish large, open data) to implement agile, cost-efficient processes and scale up productivity. The DBpedia release workflow has been re-engineered, its new primary focus is on productivity and agility, to address the challenges of size and complexity. At the same time, quality is assured by implementing a comprehensive testing methodology. We run an experimental evaluation and argue that the implemented measures increase agility and allow for cost-effective quality-control and debugging and thus achieve a higher level of maintainability. As a result, DBpedia now publishes regular (i.e. monthly) releases with over 21 billion triples with minimal publishing effort . 2020-10-27 /pmc/articles/PMC7586439/ http://dx.doi.org/10.1007/978-3-030-59833-4_1 Text en © The Author(s) 2020 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
spellingShingle Article
Hofer, Marvin
Hellmann, Sebastian
Dojchinovski, Milan
Frey, Johannes
The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows
title The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows
title_full The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows
title_fullStr The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows
title_full_unstemmed The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows
title_short The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows
title_sort new dbpedia release cycle: increasing agility and efficiency in knowledge extraction workflows
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7586439/
http://dx.doi.org/10.1007/978-3-030-59833-4_1
work_keys_str_mv AT hofermarvin thenewdbpediareleasecycleincreasingagilityandefficiencyinknowledgeextractionworkflows
AT hellmannsebastian thenewdbpediareleasecycleincreasingagilityandefficiencyinknowledgeextractionworkflows
AT dojchinovskimilan thenewdbpediareleasecycleincreasingagilityandefficiencyinknowledgeextractionworkflows
AT freyjohannes thenewdbpediareleasecycleincreasingagilityandefficiencyinknowledgeextractionworkflows
AT hofermarvin newdbpediareleasecycleincreasingagilityandefficiencyinknowledgeextractionworkflows
AT hellmannsebastian newdbpediareleasecycleincreasingagilityandefficiencyinknowledgeextractionworkflows
AT dojchinovskimilan newdbpediareleasecycleincreasingagilityandefficiencyinknowledgeextractionworkflows
AT freyjohannes newdbpediareleasecycleincreasingagilityandefficiencyinknowledgeextractionworkflows