Cargando…

Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall

Awareness of the adverse effects of chemicals is important in biomedical research and healthcare. Text mining can allow timely and low-cost extraction of this knowledge from the biomedical literature. We extended our text mining solution, LeadMine, to identify diseases and chemical-induced disease r...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lowe, Daniel M., O’Boyle, Noel M., Sayle, Roger A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4825350/ https://www.ncbi.nlm.nih.gov/pubmed/27060160 http://dx.doi.org/10.1093/database/baw039

_version_	1782426203051786240
author	Lowe, Daniel M. O’Boyle, Noel M. Sayle, Roger A.
author_facet	Lowe, Daniel M. O’Boyle, Noel M. Sayle, Roger A.
author_sort	Lowe, Daniel M.
collection	PubMed
description	Awareness of the adverse effects of chemicals is important in biomedical research and healthcare. Text mining can allow timely and low-cost extraction of this knowledge from the biomedical literature. We extended our text mining solution, LeadMine, to identify diseases and chemical-induced disease relationships (CIDs). LeadMine is a dictionary/grammar-based entity recognizer and was used to recognize and normalize both chemicals and diseases to Medical Subject Headings (MeSH) IDs. The disease lexicon was obtained from three sources: MeSH, the Disease Ontology and Wikipedia. The Wikipedia dictionary was derived from pages with a disease/symptom box, or those where the page title appeared in the lexicon. Composite entities (e.g. heart and lung disease) were detected and mapped to their composite MeSH IDs. For CIDs, we developed a simple pattern-based system to find relationships within the same sentence. Our system was evaluated in the BioCreative V Chemical–Disease Relation task and achieved very good results for both disease concept ID recognition (F(1)-score: 86.12%) and CIDs (F(1)-score: 52.20%) on the test set. As our system was over an order of magnitude faster than other solutions evaluated on the task, we were able to apply the same system to the entirety of MEDLINE allowing us to extract a collection of over 250 000 distinct CIDs.
format	Online Article Text
id	pubmed-4825350
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-48253502016-04-11 Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall Lowe, Daniel M. O’Boyle, Noel M. Sayle, Roger A. Database (Oxford) Original Article Awareness of the adverse effects of chemicals is important in biomedical research and healthcare. Text mining can allow timely and low-cost extraction of this knowledge from the biomedical literature. We extended our text mining solution, LeadMine, to identify diseases and chemical-induced disease relationships (CIDs). LeadMine is a dictionary/grammar-based entity recognizer and was used to recognize and normalize both chemicals and diseases to Medical Subject Headings (MeSH) IDs. The disease lexicon was obtained from three sources: MeSH, the Disease Ontology and Wikipedia. The Wikipedia dictionary was derived from pages with a disease/symptom box, or those where the page title appeared in the lexicon. Composite entities (e.g. heart and lung disease) were detected and mapped to their composite MeSH IDs. For CIDs, we developed a simple pattern-based system to find relationships within the same sentence. Our system was evaluated in the BioCreative V Chemical–Disease Relation task and achieved very good results for both disease concept ID recognition (F(1)-score: 86.12%) and CIDs (F(1)-score: 52.20%) on the test set. As our system was over an order of magnitude faster than other solutions evaluated on the task, we were able to apply the same system to the entirety of MEDLINE allowing us to extract a collection of over 250 000 distinct CIDs. Oxford University Press 2016-04-08 /pmc/articles/PMC4825350/ /pubmed/27060160 http://dx.doi.org/10.1093/database/baw039 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Lowe, Daniel M. O’Boyle, Noel M. Sayle, Roger A. Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall
title	Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall
title_full	Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall
title_fullStr	Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall
title_full_unstemmed	Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall
title_short	Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall
title_sort	efficient chemical-disease identification and relationship extraction using wikipedia to improve recall
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4825350/ https://www.ncbi.nlm.nih.gov/pubmed/27060160 http://dx.doi.org/10.1093/database/baw039
work_keys_str_mv	AT lowedanielm efficientchemicaldiseaseidentificationandrelationshipextractionusingwikipediatoimproverecall AT oboylenoelm efficientchemicaldiseaseidentificationandrelationshipextractionusingwikipediatoimproverecall AT saylerogera efficientchemicaldiseaseidentificationandrelationshipextractionusingwikipediatoimproverecall

Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall

Ejemplares similares