Cargando…
Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations
In this paper, we describe a hierarchical bi-directional attention-based Re-current Neural Network (RNN) as a reusable sequence encoder architecture, which is used as sentence and document encoder for document classification. The sequence encoder is composed of two bi-directional RNN equipped with a...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6105093/ https://www.ncbi.nlm.nih.gov/pubmed/30137284 http://dx.doi.org/10.1093/database/bay076 |
_version_ | 1783349598722457600 |
---|---|
author | Fergadis, Aris Baziotis, Christos Pappas, Dimitris Papageorgiou, Haris Potamianos, Alexandros |
author_facet | Fergadis, Aris Baziotis, Christos Pappas, Dimitris Papageorgiou, Haris Potamianos, Alexandros |
author_sort | Fergadis, Aris |
collection | PubMed |
description | In this paper, we describe a hierarchical bi-directional attention-based Re-current Neural Network (RNN) as a reusable sequence encoder architecture, which is used as sentence and document encoder for document classification. The sequence encoder is composed of two bi-directional RNN equipped with an attention mechanism that identifies and captures the most important elements, words or sentences, in a document followed by a dense layer for the classification task. Our approach utilizes the hierarchical nature of documents which are composed of sequences of sentences and sentences are composed of sequences of words. In our model, we use word embeddings to project the words to a low-dimensional vector space. We leverage word embeddings trained on PubMed for initializing the embedding layer of our network. We apply this model to biomedical literature specifically, on paper abstracts published in PubMed. We argue that the title of the paper itself usually contains important information more salient than a typical sentence in the abstract. For this reason, we propose a shortcut connection that integrates the title vector representation directly to the final feature representation of the document. We concatenate the sentence vector that represents the title and the vectors of the abstract to the document feature vector used as input to the task classifier. With this system we participated in the Document Triage Task of the BioCreative VI Precision Medicine Track and we achieved 0.6289 Precision, 0.7656 Recall and 0.6906 F1-score with the Precision and F1-score be the highest ranking first among the other systems. Database URL: https://github.com/afergadis/BC6PM-HRNN |
format | Online Article Text |
id | pubmed-6105093 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-61050932018-08-27 Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations Fergadis, Aris Baziotis, Christos Pappas, Dimitris Papageorgiou, Haris Potamianos, Alexandros Database (Oxford) Original Article In this paper, we describe a hierarchical bi-directional attention-based Re-current Neural Network (RNN) as a reusable sequence encoder architecture, which is used as sentence and document encoder for document classification. The sequence encoder is composed of two bi-directional RNN equipped with an attention mechanism that identifies and captures the most important elements, words or sentences, in a document followed by a dense layer for the classification task. Our approach utilizes the hierarchical nature of documents which are composed of sequences of sentences and sentences are composed of sequences of words. In our model, we use word embeddings to project the words to a low-dimensional vector space. We leverage word embeddings trained on PubMed for initializing the embedding layer of our network. We apply this model to biomedical literature specifically, on paper abstracts published in PubMed. We argue that the title of the paper itself usually contains important information more salient than a typical sentence in the abstract. For this reason, we propose a shortcut connection that integrates the title vector representation directly to the final feature representation of the document. We concatenate the sentence vector that represents the title and the vectors of the abstract to the document feature vector used as input to the task classifier. With this system we participated in the Document Triage Task of the BioCreative VI Precision Medicine Track and we achieved 0.6289 Precision, 0.7656 Recall and 0.6906 F1-score with the Precision and F1-score be the highest ranking first among the other systems. Database URL: https://github.com/afergadis/BC6PM-HRNN Oxford University Press 2018-08-21 /pmc/articles/PMC6105093/ /pubmed/30137284 http://dx.doi.org/10.1093/database/bay076 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Fergadis, Aris Baziotis, Christos Pappas, Dimitris Papageorgiou, Haris Potamianos, Alexandros Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations |
title | Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations |
title_full | Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations |
title_fullStr | Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations |
title_full_unstemmed | Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations |
title_short | Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations |
title_sort | hierarchical bi-directional attention-based rnns for supporting document classification on protein–protein interactions affected by genetic mutations |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6105093/ https://www.ncbi.nlm.nih.gov/pubmed/30137284 http://dx.doi.org/10.1093/database/bay076 |
work_keys_str_mv | AT fergadisaris hierarchicalbidirectionalattentionbasedrnnsforsupportingdocumentclassificationonproteinproteininteractionsaffectedbygeneticmutations AT baziotischristos hierarchicalbidirectionalattentionbasedrnnsforsupportingdocumentclassificationonproteinproteininteractionsaffectedbygeneticmutations AT pappasdimitris hierarchicalbidirectionalattentionbasedrnnsforsupportingdocumentclassificationonproteinproteininteractionsaffectedbygeneticmutations AT papageorgiouharis hierarchicalbidirectionalattentionbasedrnnsforsupportingdocumentclassificationonproteinproteininteractionsaffectedbygeneticmutations AT potamianosalexandros hierarchicalbidirectionalattentionbasedrnnsforsupportingdocumentclassificationonproteinproteininteractionsaffectedbygeneticmutations |