Cargando…

Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing

Background: This article describes capture of biological information using a hybrid approach that combines natural language processing to extract biological entities and crowdsourcing with annotators recruited via Amazon Mechanical Turk to judge correctness of candidate biological relations. These t...

Descripción completa

Detalles Bibliográficos
Autores principales: Burger, John D., Doughty, Emily, Khare, Ritu, Wei, Chih-Hsuan, Mishra, Rajashree, Aberdeen, John, Tresner-Kirsch, David, Wellner, Ben, Kann, Maricel G., Lu, Zhiyong, Hirschman, Lynette
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4170591/
https://www.ncbi.nlm.nih.gov/pubmed/25246425
http://dx.doi.org/10.1093/database/bau094
_version_ 1782335829678489600
author Burger, John D.
Doughty, Emily
Khare, Ritu
Wei, Chih-Hsuan
Mishra, Rajashree
Aberdeen, John
Tresner-Kirsch, David
Wellner, Ben
Kann, Maricel G.
Lu, Zhiyong
Hirschman, Lynette
author_facet Burger, John D.
Doughty, Emily
Khare, Ritu
Wei, Chih-Hsuan
Mishra, Rajashree
Aberdeen, John
Tresner-Kirsch, David
Wellner, Ben
Kann, Maricel G.
Lu, Zhiyong
Hirschman, Lynette
author_sort Burger, John D.
collection PubMed
description Background: This article describes capture of biological information using a hybrid approach that combines natural language processing to extract biological entities and crowdsourcing with annotators recruited via Amazon Mechanical Turk to judge correctness of candidate biological relations. These techniques were applied to extract gene– mutation relations from biomedical abstracts with the goal of supporting production scale capture of gene–mutation–disease findings as an open source resource for personalized medicine. Results: The hybrid system could be configured to provide good performance for gene–mutation extraction (precision ∼82%; recall ∼70% against an expert-generated gold standard) at a cost of $0.76 per abstract. This demonstrates that crowd labor platforms such as Amazon Mechanical Turk can be used to recruit quality annotators, even in an application requiring subject matter expertise; aggregated Turker judgments for gene–mutation relations exceeded 90% accuracy. Over half of the precision errors were due to mismatches against the gold standard hidden from annotator view (e.g. incorrect EntrezGene identifier or incorrect mutation position extracted), or incomplete task instructions (e.g. the need to exclude nonhuman mutations). Conclusions: The hybrid curation model provides a readily scalable cost-effective approach to curation, particularly if coupled with expert human review to filter precision errors. We plan to generalize the framework and make it available as open source software. Database URL: http://www.mitre.org/publications/technical-papers/hybrid-curation-of-gene-mutation-relations-combining-automated
format Online
Article
Text
id pubmed-4170591
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-41705912014-09-25 Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing Burger, John D. Doughty, Emily Khare, Ritu Wei, Chih-Hsuan Mishra, Rajashree Aberdeen, John Tresner-Kirsch, David Wellner, Ben Kann, Maricel G. Lu, Zhiyong Hirschman, Lynette Database (Oxford) Original Article Background: This article describes capture of biological information using a hybrid approach that combines natural language processing to extract biological entities and crowdsourcing with annotators recruited via Amazon Mechanical Turk to judge correctness of candidate biological relations. These techniques were applied to extract gene– mutation relations from biomedical abstracts with the goal of supporting production scale capture of gene–mutation–disease findings as an open source resource for personalized medicine. Results: The hybrid system could be configured to provide good performance for gene–mutation extraction (precision ∼82%; recall ∼70% against an expert-generated gold standard) at a cost of $0.76 per abstract. This demonstrates that crowd labor platforms such as Amazon Mechanical Turk can be used to recruit quality annotators, even in an application requiring subject matter expertise; aggregated Turker judgments for gene–mutation relations exceeded 90% accuracy. Over half of the precision errors were due to mismatches against the gold standard hidden from annotator view (e.g. incorrect EntrezGene identifier or incorrect mutation position extracted), or incomplete task instructions (e.g. the need to exclude nonhuman mutations). Conclusions: The hybrid curation model provides a readily scalable cost-effective approach to curation, particularly if coupled with expert human review to filter precision errors. We plan to generalize the framework and make it available as open source software. Database URL: http://www.mitre.org/publications/technical-papers/hybrid-curation-of-gene-mutation-relations-combining-automated Oxford University Press 2014-09-22 /pmc/articles/PMC4170591/ /pubmed/25246425 http://dx.doi.org/10.1093/database/bau094 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Burger, John D.
Doughty, Emily
Khare, Ritu
Wei, Chih-Hsuan
Mishra, Rajashree
Aberdeen, John
Tresner-Kirsch, David
Wellner, Ben
Kann, Maricel G.
Lu, Zhiyong
Hirschman, Lynette
Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing
title Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing
title_full Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing
title_fullStr Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing
title_full_unstemmed Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing
title_short Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing
title_sort hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4170591/
https://www.ncbi.nlm.nih.gov/pubmed/25246425
http://dx.doi.org/10.1093/database/bau094
work_keys_str_mv AT burgerjohnd hybridcurationofgenemutationrelationscombiningautomatedextractionandcrowdsourcing
AT doughtyemily hybridcurationofgenemutationrelationscombiningautomatedextractionandcrowdsourcing
AT khareritu hybridcurationofgenemutationrelationscombiningautomatedextractionandcrowdsourcing
AT weichihhsuan hybridcurationofgenemutationrelationscombiningautomatedextractionandcrowdsourcing
AT mishrarajashree hybridcurationofgenemutationrelationscombiningautomatedextractionandcrowdsourcing
AT aberdeenjohn hybridcurationofgenemutationrelationscombiningautomatedextractionandcrowdsourcing
AT tresnerkirschdavid hybridcurationofgenemutationrelationscombiningautomatedextractionandcrowdsourcing
AT wellnerben hybridcurationofgenemutationrelationscombiningautomatedextractionandcrowdsourcing
AT kannmaricelg hybridcurationofgenemutationrelationscombiningautomatedextractionandcrowdsourcing
AT luzhiyong hybridcurationofgenemutationrelationscombiningautomatedextractionandcrowdsourcing
AT hirschmanlynette hybridcurationofgenemutationrelationscombiningautomatedextractionandcrowdsourcing