Cargando…
Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction
Biomedical researchers regularly discover new interactions between chemical compounds/drugs and genes/proteins, and report them in research literature. Having knowledge about these interactions is crucially important in many research areas such as precision medicine and drug discovery. The BioCreati...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6310522/ https://www.ncbi.nlm.nih.gov/pubmed/30576487 http://dx.doi.org/10.1093/database/bay120 |
_version_ | 1783383451974500352 |
---|---|
author | Mehryary, Farrokh Björne, Jari Salakoski, Tapio Ginter, Filip |
author_facet | Mehryary, Farrokh Björne, Jari Salakoski, Tapio Ginter, Filip |
author_sort | Mehryary, Farrokh |
collection | PubMed |
description | Biomedical researchers regularly discover new interactions between chemical compounds/drugs and genes/proteins, and report them in research literature. Having knowledge about these interactions is crucially important in many research areas such as precision medicine and drug discovery. The BioCreative VI Task 5 (CHEMPROT) challenge promotes the development and evaluation of computer systems that can automatically recognize and extract statements of such interactions from biomedical literature. We participated in this challenge with a Support Vector Machine (SVM) system and a deep learning-based system (ST-ANN), and achieved an F-score of 60.99 for the task. After the shared task, we have significantly improved the performance of the ST-ANN system. Additionally, we have developed a new deep learning-based system (I-ANN) that considerably outperforms the ST-ANN system. Both ST-ANN and I-ANN systems are centered around training an ensemble of artificial neural networks and utilizing different bidirectional Long Short-Term Memory (LSTM) chains for representing the shortest dependency path and/or the full sentence. By combining the predictions of the SVM and the I-ANN systems, we achieved an F-score of 63.10 for the task, improving our previous F-score by 2.11 percentage points. Our systems are fully open-source and publicly available. We highlight that the systems we present in this study are not applicable only to the BioCreative VI Task 5, but can be effortlessly re-trained to extract any types of relations of interest, with no modifications of the source code required, if a manually annotated corpus is provided as training data in a specific file format. |
format | Online Article Text |
id | pubmed-6310522 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-63105222019-01-07 Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction Mehryary, Farrokh Björne, Jari Salakoski, Tapio Ginter, Filip Database (Oxford) Original Article Biomedical researchers regularly discover new interactions between chemical compounds/drugs and genes/proteins, and report them in research literature. Having knowledge about these interactions is crucially important in many research areas such as precision medicine and drug discovery. The BioCreative VI Task 5 (CHEMPROT) challenge promotes the development and evaluation of computer systems that can automatically recognize and extract statements of such interactions from biomedical literature. We participated in this challenge with a Support Vector Machine (SVM) system and a deep learning-based system (ST-ANN), and achieved an F-score of 60.99 for the task. After the shared task, we have significantly improved the performance of the ST-ANN system. Additionally, we have developed a new deep learning-based system (I-ANN) that considerably outperforms the ST-ANN system. Both ST-ANN and I-ANN systems are centered around training an ensemble of artificial neural networks and utilizing different bidirectional Long Short-Term Memory (LSTM) chains for representing the shortest dependency path and/or the full sentence. By combining the predictions of the SVM and the I-ANN systems, we achieved an F-score of 63.10 for the task, improving our previous F-score by 2.11 percentage points. Our systems are fully open-source and publicly available. We highlight that the systems we present in this study are not applicable only to the BioCreative VI Task 5, but can be effortlessly re-trained to extract any types of relations of interest, with no modifications of the source code required, if a manually annotated corpus is provided as training data in a specific file format. Oxford University Press 2018-11-06 /pmc/articles/PMC6310522/ /pubmed/30576487 http://dx.doi.org/10.1093/database/bay120 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Mehryary, Farrokh Björne, Jari Salakoski, Tapio Ginter, Filip Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction |
title | Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction |
title_full | Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction |
title_fullStr | Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction |
title_full_unstemmed | Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction |
title_short | Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction |
title_sort | potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6310522/ https://www.ncbi.nlm.nih.gov/pubmed/30576487 http://dx.doi.org/10.1093/database/bay120 |
work_keys_str_mv | AT mehryaryfarrokh potentpairingensembleoflongshorttermmemorynetworksandsupportvectormachineforchemicalproteinrelationextraction AT bjornejari potentpairingensembleoflongshorttermmemorynetworksandsupportvectormachineforchemicalproteinrelationextraction AT salakoskitapio potentpairingensembleoflongshorttermmemorynetworksandsupportvectormachineforchemicalproteinrelationextraction AT ginterfilip potentpairingensembleoflongshorttermmemorynetworksandsupportvectormachineforchemicalproteinrelationextraction |