Cargando…

Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction

Biomedical researchers regularly discover new interactions between chemical compounds/drugs and genes/proteins, and report them in research literature. Having knowledge about these interactions is crucially important in many research areas such as precision medicine and drug discovery. The BioCreati...

Descripción completa

Detalles Bibliográficos
Autores principales: Mehryary, Farrokh, Björne, Jari, Salakoski, Tapio, Ginter, Filip
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6310522/
https://www.ncbi.nlm.nih.gov/pubmed/30576487
http://dx.doi.org/10.1093/database/bay120
_version_ 1783383451974500352
author Mehryary, Farrokh
Björne, Jari
Salakoski, Tapio
Ginter, Filip
author_facet Mehryary, Farrokh
Björne, Jari
Salakoski, Tapio
Ginter, Filip
author_sort Mehryary, Farrokh
collection PubMed
description Biomedical researchers regularly discover new interactions between chemical compounds/drugs and genes/proteins, and report them in research literature. Having knowledge about these interactions is crucially important in many research areas such as precision medicine and drug discovery. The BioCreative VI Task 5 (CHEMPROT) challenge promotes the development and evaluation of computer systems that can automatically recognize and extract statements of such interactions from biomedical literature. We participated in this challenge with a Support Vector Machine (SVM) system and a deep learning-based system (ST-ANN), and achieved an F-score of 60.99 for the task. After the shared task, we have significantly improved the performance of the ST-ANN system. Additionally, we have developed a new deep learning-based system (I-ANN) that considerably outperforms the ST-ANN system. Both ST-ANN and I-ANN systems are centered around training an ensemble of artificial neural networks and utilizing different bidirectional Long Short-Term Memory (LSTM) chains for representing the shortest dependency path and/or the full sentence. By combining the predictions of the SVM and the I-ANN systems, we achieved an F-score of 63.10 for the task, improving our previous F-score by 2.11 percentage points. Our systems are fully open-source and publicly available. We highlight that the systems we present in this study are not applicable only to the BioCreative VI Task 5, but can be effortlessly re-trained to extract any types of relations of interest, with no modifications of the source code required, if a manually annotated corpus is provided as training data in a specific file format.
format Online
Article
Text
id pubmed-6310522
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63105222019-01-07 Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction Mehryary, Farrokh Björne, Jari Salakoski, Tapio Ginter, Filip Database (Oxford) Original Article Biomedical researchers regularly discover new interactions between chemical compounds/drugs and genes/proteins, and report them in research literature. Having knowledge about these interactions is crucially important in many research areas such as precision medicine and drug discovery. The BioCreative VI Task 5 (CHEMPROT) challenge promotes the development and evaluation of computer systems that can automatically recognize and extract statements of such interactions from biomedical literature. We participated in this challenge with a Support Vector Machine (SVM) system and a deep learning-based system (ST-ANN), and achieved an F-score of 60.99 for the task. After the shared task, we have significantly improved the performance of the ST-ANN system. Additionally, we have developed a new deep learning-based system (I-ANN) that considerably outperforms the ST-ANN system. Both ST-ANN and I-ANN systems are centered around training an ensemble of artificial neural networks and utilizing different bidirectional Long Short-Term Memory (LSTM) chains for representing the shortest dependency path and/or the full sentence. By combining the predictions of the SVM and the I-ANN systems, we achieved an F-score of 63.10 for the task, improving our previous F-score by 2.11 percentage points. Our systems are fully open-source and publicly available. We highlight that the systems we present in this study are not applicable only to the BioCreative VI Task 5, but can be effortlessly re-trained to extract any types of relations of interest, with no modifications of the source code required, if a manually annotated corpus is provided as training data in a specific file format. Oxford University Press 2018-11-06 /pmc/articles/PMC6310522/ /pubmed/30576487 http://dx.doi.org/10.1093/database/bay120 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Mehryary, Farrokh
Björne, Jari
Salakoski, Tapio
Ginter, Filip
Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction
title Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction
title_full Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction
title_fullStr Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction
title_full_unstemmed Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction
title_short Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction
title_sort potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6310522/
https://www.ncbi.nlm.nih.gov/pubmed/30576487
http://dx.doi.org/10.1093/database/bay120
work_keys_str_mv AT mehryaryfarrokh potentpairingensembleoflongshorttermmemorynetworksandsupportvectormachineforchemicalproteinrelationextraction
AT bjornejari potentpairingensembleoflongshorttermmemorynetworksandsupportvectormachineforchemicalproteinrelationextraction
AT salakoskitapio potentpairingensembleoflongshorttermmemorynetworksandsupportvectormachineforchemicalproteinrelationextraction
AT ginterfilip potentpairingensembleoflongshorttermmemorynetworksandsupportvectormachineforchemicalproteinrelationextraction