Cargando…

How to Get the Most out of Your Curation Effort

Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rzhetsky, Andrey, Shatkay, Hagit, Wilbur, W. John
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2678295/ https://www.ncbi.nlm.nih.gov/pubmed/19461884 http://dx.doi.org/10.1371/journal.pcbi.1000391

_version_	1782166842556547072
author	Rzhetsky, Andrey Shatkay, Hagit Wilbur, W. John
author_facet	Rzhetsky, Andrey Shatkay, Hagit Wilbur, W. John
author_sort	Rzhetsky, Andrey
collection	PubMed
description	Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows computing certainty-level for individual annotations, given annotator-specific parameters estimated from data. We developed two probabilistic models for performing this analysis, compared these models using computer simulation, and tested each model's actual performance, based on a large data set generated by human annotators specifically for this study. We show that even in the worst-case scenario, when all annotators disagree, our approach allows us to significantly increase the probability of choosing the correct annotation. Along with this publication we make publicly available a corpus of 10,000 sentences annotated according to several cardinal dimensions that we have introduced in earlier work. The 10,000 sentences were all 3-fold annotated by a group of eight experts, while a 1,000-sentence subset was further 5-fold annotated by five new experts. While the presented data represent a specialized curation task, our modeling approach is general; most data annotation studies could benefit from our methodology.
format	Text
id	pubmed-2678295
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-26782952009-05-22 How to Get the Most out of Your Curation Effort Rzhetsky, Andrey Shatkay, Hagit Wilbur, W. John PLoS Comput Biol Research Article Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows computing certainty-level for individual annotations, given annotator-specific parameters estimated from data. We developed two probabilistic models for performing this analysis, compared these models using computer simulation, and tested each model's actual performance, based on a large data set generated by human annotators specifically for this study. We show that even in the worst-case scenario, when all annotators disagree, our approach allows us to significantly increase the probability of choosing the correct annotation. Along with this publication we make publicly available a corpus of 10,000 sentences annotated according to several cardinal dimensions that we have introduced in earlier work. The 10,000 sentences were all 3-fold annotated by a group of eight experts, while a 1,000-sentence subset was further 5-fold annotated by five new experts. While the presented data represent a specialized curation task, our modeling approach is general; most data annotation studies could benefit from our methodology. Public Library of Science 2009-05-22 /pmc/articles/PMC2678295/ /pubmed/19461884 http://dx.doi.org/10.1371/journal.pcbi.1000391 Text en This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle	Research Article Rzhetsky, Andrey Shatkay, Hagit Wilbur, W. John How to Get the Most out of Your Curation Effort
title	How to Get the Most out of Your Curation Effort
title_full	How to Get the Most out of Your Curation Effort
title_fullStr	How to Get the Most out of Your Curation Effort
title_full_unstemmed	How to Get the Most out of Your Curation Effort
title_short	How to Get the Most out of Your Curation Effort
title_sort	how to get the most out of your curation effort
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2678295/ https://www.ncbi.nlm.nih.gov/pubmed/19461884 http://dx.doi.org/10.1371/journal.pcbi.1000391
work_keys_str_mv	AT rzhetskyandrey howtogetthemostoutofyourcurationeffort AT shatkayhagit howtogetthemostoutofyourcurationeffort AT wilburwjohn howtogetthemostoutofyourcurationeffort

How to Get the Most out of Your Curation Effort

Ejemplares similares