Cargando…
Automatic extraction of gene-disease associations from literature using joint ensemble learning
A wealth of knowledge concerning relations between genes and its associated diseases is present in biomedical literature. Mining these biological associations from literature can provide immense support to research ranging from drug-targetable pathways to biomarker discovery. However, time and cost...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6061985/ https://www.ncbi.nlm.nih.gov/pubmed/30048465 http://dx.doi.org/10.1371/journal.pone.0200699 |
_version_ | 1783342318392180736 |
---|---|
author | Bhasuran, Balu Natarajan, Jeyakumar |
author_facet | Bhasuran, Balu Natarajan, Jeyakumar |
author_sort | Bhasuran, Balu |
collection | PubMed |
description | A wealth of knowledge concerning relations between genes and its associated diseases is present in biomedical literature. Mining these biological associations from literature can provide immense support to research ranging from drug-targetable pathways to biomarker discovery. However, time and cost of manual curation heavily slows it down. In this current scenario one of the crucial technologies is biomedical text mining, and relation extraction shows the promising result to explore the research of genes associated with diseases. By developing automatic extraction of gene-disease associations from the literature using joint ensemble learning we addressed this problem from a text mining perspective. In the proposed work, we employ a supervised machine learning approach in which a rich feature set covering conceptual, syntax and semantic properties jointly learned with word embedding are trained using ensemble support vector machine for extracting gene-disease relations from four gold standard corpora. Upon evaluating the machine learning approach shows promised results of 85.34%, 83.93%,87.39% and 85.57% of F-measure on EUADR, GAD, CoMAGC and PolySearch corpora respectively. We strongly believe that the presented novel approach combining rich syntax and semantic feature set with domain-specific word embedding through ensemble support vector machines evaluated on four gold standard corpora can act as a new baseline for future works in gene-disease relation extraction from literature. |
format | Online Article Text |
id | pubmed-6061985 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-60619852018-08-03 Automatic extraction of gene-disease associations from literature using joint ensemble learning Bhasuran, Balu Natarajan, Jeyakumar PLoS One Research Article A wealth of knowledge concerning relations between genes and its associated diseases is present in biomedical literature. Mining these biological associations from literature can provide immense support to research ranging from drug-targetable pathways to biomarker discovery. However, time and cost of manual curation heavily slows it down. In this current scenario one of the crucial technologies is biomedical text mining, and relation extraction shows the promising result to explore the research of genes associated with diseases. By developing automatic extraction of gene-disease associations from the literature using joint ensemble learning we addressed this problem from a text mining perspective. In the proposed work, we employ a supervised machine learning approach in which a rich feature set covering conceptual, syntax and semantic properties jointly learned with word embedding are trained using ensemble support vector machine for extracting gene-disease relations from four gold standard corpora. Upon evaluating the machine learning approach shows promised results of 85.34%, 83.93%,87.39% and 85.57% of F-measure on EUADR, GAD, CoMAGC and PolySearch corpora respectively. We strongly believe that the presented novel approach combining rich syntax and semantic feature set with domain-specific word embedding through ensemble support vector machines evaluated on four gold standard corpora can act as a new baseline for future works in gene-disease relation extraction from literature. Public Library of Science 2018-07-26 /pmc/articles/PMC6061985/ /pubmed/30048465 http://dx.doi.org/10.1371/journal.pone.0200699 Text en © 2018 Bhasuran, Natarajan http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Bhasuran, Balu Natarajan, Jeyakumar Automatic extraction of gene-disease associations from literature using joint ensemble learning |
title | Automatic extraction of gene-disease associations from literature using joint ensemble learning |
title_full | Automatic extraction of gene-disease associations from literature using joint ensemble learning |
title_fullStr | Automatic extraction of gene-disease associations from literature using joint ensemble learning |
title_full_unstemmed | Automatic extraction of gene-disease associations from literature using joint ensemble learning |
title_short | Automatic extraction of gene-disease associations from literature using joint ensemble learning |
title_sort | automatic extraction of gene-disease associations from literature using joint ensemble learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6061985/ https://www.ncbi.nlm.nih.gov/pubmed/30048465 http://dx.doi.org/10.1371/journal.pone.0200699 |
work_keys_str_mv | AT bhasuranbalu automaticextractionofgenediseaseassociationsfromliteratureusingjointensemblelearning AT natarajanjeyakumar automaticextractionofgenediseaseassociationsfromliteratureusingjointensemblelearning |