Cargando…

A crowdsourcing workflow for extracting chemical-induced disease relations from free text

Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative m...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Tong Shu, Bravo, Àlex, Furlong, Laura I., Good, Benjamin M., Su, Andrew I.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4834205/ https://www.ncbi.nlm.nih.gov/pubmed/27087308 http://dx.doi.org/10.1093/database/baw051

_version_	1782427459823599616
author	Li, Tong Shu Bravo, Àlex Furlong, Laura I. Good, Benjamin M. Su, Andrew I.
author_facet	Li, Tong Shu Bravo, Àlex Furlong, Laura I. Good, Benjamin M. Su, Andrew I.
author_sort	Li, Tong Shu
collection	PubMed
description	Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505 F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available at https://github.com/SuLab/crowd_cid_relex Database URL: https://github.com/SuLab/crowd_cid_relex
format	Online Article Text
id	pubmed-4834205
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-48342052016-04-18 A crowdsourcing workflow for extracting chemical-induced disease relations from free text Li, Tong Shu Bravo, Àlex Furlong, Laura I. Good, Benjamin M. Su, Andrew I. Database (Oxford) Original Article Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505 F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available at https://github.com/SuLab/crowd_cid_relex Database URL: https://github.com/SuLab/crowd_cid_relex Oxford University Press 2016-04-16 /pmc/articles/PMC4834205/ /pubmed/27087308 http://dx.doi.org/10.1093/database/baw051 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Li, Tong Shu Bravo, Àlex Furlong, Laura I. Good, Benjamin M. Su, Andrew I. A crowdsourcing workflow for extracting chemical-induced disease relations from free text
title	A crowdsourcing workflow for extracting chemical-induced disease relations from free text
title_full	A crowdsourcing workflow for extracting chemical-induced disease relations from free text
title_fullStr	A crowdsourcing workflow for extracting chemical-induced disease relations from free text
title_full_unstemmed	A crowdsourcing workflow for extracting chemical-induced disease relations from free text
title_short	A crowdsourcing workflow for extracting chemical-induced disease relations from free text
title_sort	crowdsourcing workflow for extracting chemical-induced disease relations from free text
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4834205/ https://www.ncbi.nlm.nih.gov/pubmed/27087308 http://dx.doi.org/10.1093/database/baw051
work_keys_str_mv	AT litongshu acrowdsourcingworkflowforextractingchemicalinduceddiseaserelationsfromfreetext AT bravoalex acrowdsourcingworkflowforextractingchemicalinduceddiseaserelationsfromfreetext AT furlonglaurai acrowdsourcingworkflowforextractingchemicalinduceddiseaserelationsfromfreetext AT goodbenjaminm acrowdsourcingworkflowforextractingchemicalinduceddiseaserelationsfromfreetext AT suandrewi acrowdsourcingworkflowforextractingchemicalinduceddiseaserelationsfromfreetext AT litongshu crowdsourcingworkflowforextractingchemicalinduceddiseaserelationsfromfreetext AT bravoalex crowdsourcingworkflowforextractingchemicalinduceddiseaserelationsfromfreetext AT furlonglaurai crowdsourcingworkflowforextractingchemicalinduceddiseaserelationsfromfreetext AT goodbenjaminm crowdsourcingworkflowforextractingchemicalinduceddiseaserelationsfromfreetext AT suandrewi crowdsourcingworkflowforextractingchemicalinduceddiseaserelationsfromfreetext

A crowdsourcing workflow for extracting chemical-induced disease relations from free text

Ejemplares similares