Cargando…
Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction
End-to-end knowledge base construction systems using statistical inference are enabling more people to automatically extract high-quality domain-specific information from unstructured data. As a result of deploying DeepDive framework across several domains, we found new challenges in debugging and i...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4852148/ https://www.ncbi.nlm.nih.gov/pubmed/27144082 http://dx.doi.org/10.14778/2824032.2824101 |
_version_ | 1782429893488803840 |
---|---|
author | Shin, Jaeho Ré, Christopher Cafarella, Michael |
author_facet | Shin, Jaeho Ré, Christopher Cafarella, Michael |
author_sort | Shin, Jaeho |
collection | PubMed |
description | End-to-end knowledge base construction systems using statistical inference are enabling more people to automatically extract high-quality domain-specific information from unstructured data. As a result of deploying DeepDive framework across several domains, we found new challenges in debugging and improving such end-to-end systems to construct high-quality knowledge bases. DeepDive has an iterative development cycle in which users improve the data. To help our users, we needed to develop principles for analyzing the system's error as well as provide tooling for inspecting and labeling various data products of the system. We created guidelines for error analysis modeled after our colleagues' best practices, in which data labeling plays a critical role in every step of the analysis. To enable more productive and systematic data labeling, we created Mindtagger, a versatile tool that can be configured to support a wide range of tasks. In this demonstration, we show in detail what data labeling tasks are modeled in our error analysis guidelines and how each of them is performed using Mindtagger. |
format | Online Article Text |
id | pubmed-4852148 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
record_format | MEDLINE/PubMed |
spelling | pubmed-48521482016-05-01 Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction Shin, Jaeho Ré, Christopher Cafarella, Michael Proceedings VLDB Endowment Article End-to-end knowledge base construction systems using statistical inference are enabling more people to automatically extract high-quality domain-specific information from unstructured data. As a result of deploying DeepDive framework across several domains, we found new challenges in debugging and improving such end-to-end systems to construct high-quality knowledge bases. DeepDive has an iterative development cycle in which users improve the data. To help our users, we needed to develop principles for analyzing the system's error as well as provide tooling for inspecting and labeling various data products of the system. We created guidelines for error analysis modeled after our colleagues' best practices, in which data labeling plays a critical role in every step of the analysis. To enable more productive and systematic data labeling, we created Mindtagger, a versatile tool that can be configured to support a wide range of tasks. In this demonstration, we show in detail what data labeling tasks are modeled in our error analysis guidelines and how each of them is performed using Mindtagger. 2015-08 /pmc/articles/PMC4852148/ /pubmed/27144082 http://dx.doi.org/10.14778/2824032.2824101 Text en This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain permission prior to any use beyond those covered by the license. Contact copyright holder by emailing info@vldb.org. Articles from this volume were invited to present their results at the 41st International Conference on Very Large Data Bases, August 31st - September 4th 2015, Kohala Coast, Hawaii. |
spellingShingle | Article Shin, Jaeho Ré, Christopher Cafarella, Michael Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction |
title | Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction |
title_full | Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction |
title_fullStr | Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction |
title_full_unstemmed | Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction |
title_short | Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction |
title_sort | mindtagger: a demonstration of data labeling in knowledge base construction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4852148/ https://www.ncbi.nlm.nih.gov/pubmed/27144082 http://dx.doi.org/10.14778/2824032.2824101 |
work_keys_str_mv | AT shinjaeho mindtaggerademonstrationofdatalabelinginknowledgebaseconstruction AT rechristopher mindtaggerademonstrationofdatalabelinginknowledgebaseconstruction AT cafarellamichael mindtaggerademonstrationofdatalabelinginknowledgebaseconstruction |