Cargando…

Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text

Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new syste...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bravo, Àlex, Li, Tong Shu, Su, Andrew I., Good, Benjamin M., Furlong, Laura I.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	Database Tool
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908671/ https://www.ncbi.nlm.nih.gov/pubmed/27307137 http://dx.doi.org/10.1093/database/baw094

_version_	1782437720486838272
author	Bravo, Àlex Li, Tong Shu Su, Andrew I. Good, Benjamin M. Furlong, Laura I.
author_facet	Bravo, Àlex Li, Tong Shu Su, Andrew I. Good, Benjamin M. Furlong, Laura I.
author_sort	Bravo, Àlex
collection	PubMed
description	Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects. Database URL: https://zenodo.org/record/29887?ln¼en#.VsL3yDLWR_V
format	Online Article Text
id	pubmed-4908671
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-49086712016-06-17 Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text Bravo, Àlex Li, Tong Shu Su, Andrew I. Good, Benjamin M. Furlong, Laura I. Database (Oxford) Database Tool Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects. Database URL: https://zenodo.org/record/29887?ln¼en#.VsL3yDLWR_V Oxford University Press 2016-06-15 /pmc/articles/PMC4908671/ /pubmed/27307137 http://dx.doi.org/10.1093/database/baw094 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Database Tool Bravo, Àlex Li, Tong Shu Su, Andrew I. Good, Benjamin M. Furlong, Laura I. Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text
title	Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text
title_full	Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text
title_fullStr	Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text
title_full_unstemmed	Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text
title_short	Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text
title_sort	combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text
topic	Database Tool
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908671/ https://www.ncbi.nlm.nih.gov/pubmed/27307137 http://dx.doi.org/10.1093/database/baw094
work_keys_str_mv	AT bravoalex combiningmachinelearningcrowdsourcingandexpertknowledgetodetectchemicalinduceddiseasesintext AT litongshu combiningmachinelearningcrowdsourcingandexpertknowledgetodetectchemicalinduceddiseasesintext AT suandrewi combiningmachinelearningcrowdsourcingandexpertknowledgetodetectchemicalinduceddiseasesintext AT goodbenjaminm combiningmachinelearningcrowdsourcingandexpertknowledgetodetectchemicalinduceddiseasesintext AT furlonglaurai combiningmachinelearningcrowdsourcingandexpertknowledgetodetectchemicalinduceddiseasesintext

Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text

Ejemplares similares