Cargando…

Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific...

Descripción completa

Detalles Bibliográficos
Autores principales: Barnickel, Thorsten, Weston, Jason, Collobert, Ronan, Mewes, Hans-Werner, Stümpflen, Volker
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2712690/
https://www.ncbi.nlm.nih.gov/pubmed/19636432
http://dx.doi.org/10.1371/journal.pone.0006393
_version_ 1782169514878697472
author Barnickel, Thorsten
Weston, Jason
Collobert, Ronan
Mewes, Hans-Werner
Stümpflen, Volker
author_facet Barnickel, Thorsten
Weston, Jason
Collobert, Ronan
Mewes, Hans-Werner
Stümpflen, Volker
author_sort Barnickel, Thorsten
collection PubMed
description To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (“Semantic Extraction using a Neural Network Architecture”), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches.
format Text
id pubmed-2712690
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-27126902009-07-28 Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts Barnickel, Thorsten Weston, Jason Collobert, Ronan Mewes, Hans-Werner Stümpflen, Volker PLoS One Research Article To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (“Semantic Extraction using a Neural Network Architecture”), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches. Public Library of Science 2009-07-28 /pmc/articles/PMC2712690/ /pubmed/19636432 http://dx.doi.org/10.1371/journal.pone.0006393 Text en Barnickel et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Barnickel, Thorsten
Weston, Jason
Collobert, Ronan
Mewes, Hans-Werner
Stümpflen, Volker
Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts
title Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts
title_full Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts
title_fullStr Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts
title_full_unstemmed Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts
title_short Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts
title_sort large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2712690/
https://www.ncbi.nlm.nih.gov/pubmed/19636432
http://dx.doi.org/10.1371/journal.pone.0006393
work_keys_str_mv AT barnickelthorsten largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts
AT westonjason largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts
AT collobertronan largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts
AT meweshanswerner largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts
AT stumpflenvolker largescaleapplicationofneuralnetworkbasedsemanticrolelabelingforautomatedrelationextractionfrombiomedicaltexts