Cargando…

Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction

End-to-end knowledge base construction systems using statistical inference are enabling more people to automatically extract high-quality domain-specific information from unstructured data. As a result of deploying DeepDive framework across several domains, we found new challenges in debugging and i...

Descripción completa

Detalles Bibliográficos
Autores principales: Shin, Jaeho, Ré, Christopher, Cafarella, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4852148/
https://www.ncbi.nlm.nih.gov/pubmed/27144082
http://dx.doi.org/10.14778/2824032.2824101
_version_ 1782429893488803840
author Shin, Jaeho
Ré, Christopher
Cafarella, Michael
author_facet Shin, Jaeho
Ré, Christopher
Cafarella, Michael
author_sort Shin, Jaeho
collection PubMed
description End-to-end knowledge base construction systems using statistical inference are enabling more people to automatically extract high-quality domain-specific information from unstructured data. As a result of deploying DeepDive framework across several domains, we found new challenges in debugging and improving such end-to-end systems to construct high-quality knowledge bases. DeepDive has an iterative development cycle in which users improve the data. To help our users, we needed to develop principles for analyzing the system's error as well as provide tooling for inspecting and labeling various data products of the system. We created guidelines for error analysis modeled after our colleagues' best practices, in which data labeling plays a critical role in every step of the analysis. To enable more productive and systematic data labeling, we created Mindtagger, a versatile tool that can be configured to support a wide range of tasks. In this demonstration, we show in detail what data labeling tasks are modeled in our error analysis guidelines and how each of them is performed using Mindtagger.
format Online
Article
Text
id pubmed-4852148
institution National Center for Biotechnology Information
language English
publishDate 2015
record_format MEDLINE/PubMed
spelling pubmed-48521482016-05-01 Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction Shin, Jaeho Ré, Christopher Cafarella, Michael Proceedings VLDB Endowment Article End-to-end knowledge base construction systems using statistical inference are enabling more people to automatically extract high-quality domain-specific information from unstructured data. As a result of deploying DeepDive framework across several domains, we found new challenges in debugging and improving such end-to-end systems to construct high-quality knowledge bases. DeepDive has an iterative development cycle in which users improve the data. To help our users, we needed to develop principles for analyzing the system's error as well as provide tooling for inspecting and labeling various data products of the system. We created guidelines for error analysis modeled after our colleagues' best practices, in which data labeling plays a critical role in every step of the analysis. To enable more productive and systematic data labeling, we created Mindtagger, a versatile tool that can be configured to support a wide range of tasks. In this demonstration, we show in detail what data labeling tasks are modeled in our error analysis guidelines and how each of them is performed using Mindtagger. 2015-08 /pmc/articles/PMC4852148/ /pubmed/27144082 http://dx.doi.org/10.14778/2824032.2824101 Text en This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain permission prior to any use beyond those covered by the license. Contact copyright holder by emailing info@vldb.org. Articles from this volume were invited to present their results at the 41st International Conference on Very Large Data Bases, August 31st - September 4th 2015, Kohala Coast, Hawaii.
spellingShingle Article
Shin, Jaeho
Ré, Christopher
Cafarella, Michael
Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction
title Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction
title_full Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction
title_fullStr Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction
title_full_unstemmed Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction
title_short Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction
title_sort mindtagger: a demonstration of data labeling in knowledge base construction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4852148/
https://www.ncbi.nlm.nih.gov/pubmed/27144082
http://dx.doi.org/10.14778/2824032.2824101
work_keys_str_mv AT shinjaeho mindtaggerademonstrationofdatalabelinginknowledgebaseconstruction
AT rechristopher mindtaggerademonstrationofdatalabelinginknowledgebaseconstruction
AT cafarellamichael mindtaggerademonstrationofdatalabelinginknowledgebaseconstruction