Cargando…

Automatic extraction of gene-disease associations from literature using joint ensemble learning

A wealth of knowledge concerning relations between genes and its associated diseases is present in biomedical literature. Mining these biological associations from literature can provide immense support to research ranging from drug-targetable pathways to biomarker discovery. However, time and cost...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhasuran, Balu, Natarajan, Jeyakumar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6061985/
https://www.ncbi.nlm.nih.gov/pubmed/30048465
http://dx.doi.org/10.1371/journal.pone.0200699
_version_ 1783342318392180736
author Bhasuran, Balu
Natarajan, Jeyakumar
author_facet Bhasuran, Balu
Natarajan, Jeyakumar
author_sort Bhasuran, Balu
collection PubMed
description A wealth of knowledge concerning relations between genes and its associated diseases is present in biomedical literature. Mining these biological associations from literature can provide immense support to research ranging from drug-targetable pathways to biomarker discovery. However, time and cost of manual curation heavily slows it down. In this current scenario one of the crucial technologies is biomedical text mining, and relation extraction shows the promising result to explore the research of genes associated with diseases. By developing automatic extraction of gene-disease associations from the literature using joint ensemble learning we addressed this problem from a text mining perspective. In the proposed work, we employ a supervised machine learning approach in which a rich feature set covering conceptual, syntax and semantic properties jointly learned with word embedding are trained using ensemble support vector machine for extracting gene-disease relations from four gold standard corpora. Upon evaluating the machine learning approach shows promised results of 85.34%, 83.93%,87.39% and 85.57% of F-measure on EUADR, GAD, CoMAGC and PolySearch corpora respectively. We strongly believe that the presented novel approach combining rich syntax and semantic feature set with domain-specific word embedding through ensemble support vector machines evaluated on four gold standard corpora can act as a new baseline for future works in gene-disease relation extraction from literature.
format Online
Article
Text
id pubmed-6061985
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-60619852018-08-03 Automatic extraction of gene-disease associations from literature using joint ensemble learning Bhasuran, Balu Natarajan, Jeyakumar PLoS One Research Article A wealth of knowledge concerning relations between genes and its associated diseases is present in biomedical literature. Mining these biological associations from literature can provide immense support to research ranging from drug-targetable pathways to biomarker discovery. However, time and cost of manual curation heavily slows it down. In this current scenario one of the crucial technologies is biomedical text mining, and relation extraction shows the promising result to explore the research of genes associated with diseases. By developing automatic extraction of gene-disease associations from the literature using joint ensemble learning we addressed this problem from a text mining perspective. In the proposed work, we employ a supervised machine learning approach in which a rich feature set covering conceptual, syntax and semantic properties jointly learned with word embedding are trained using ensemble support vector machine for extracting gene-disease relations from four gold standard corpora. Upon evaluating the machine learning approach shows promised results of 85.34%, 83.93%,87.39% and 85.57% of F-measure on EUADR, GAD, CoMAGC and PolySearch corpora respectively. We strongly believe that the presented novel approach combining rich syntax and semantic feature set with domain-specific word embedding through ensemble support vector machines evaluated on four gold standard corpora can act as a new baseline for future works in gene-disease relation extraction from literature. Public Library of Science 2018-07-26 /pmc/articles/PMC6061985/ /pubmed/30048465 http://dx.doi.org/10.1371/journal.pone.0200699 Text en © 2018 Bhasuran, Natarajan http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bhasuran, Balu
Natarajan, Jeyakumar
Automatic extraction of gene-disease associations from literature using joint ensemble learning
title Automatic extraction of gene-disease associations from literature using joint ensemble learning
title_full Automatic extraction of gene-disease associations from literature using joint ensemble learning
title_fullStr Automatic extraction of gene-disease associations from literature using joint ensemble learning
title_full_unstemmed Automatic extraction of gene-disease associations from literature using joint ensemble learning
title_short Automatic extraction of gene-disease associations from literature using joint ensemble learning
title_sort automatic extraction of gene-disease associations from literature using joint ensemble learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6061985/
https://www.ncbi.nlm.nih.gov/pubmed/30048465
http://dx.doi.org/10.1371/journal.pone.0200699
work_keys_str_mv AT bhasuranbalu automaticextractionofgenediseaseassociationsfromliteratureusingjointensemblelearning
AT natarajanjeyakumar automaticextractionofgenediseaseassociationsfromliteratureusingjointensemblelearning