Cargando…

Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations

In this paper, we describe a hierarchical bi-directional attention-based Re-current Neural Network (RNN) as a reusable sequence encoder architecture, which is used as sentence and document encoder for document classification. The sequence encoder is composed of two bi-directional RNN equipped with a...

Descripción completa

Detalles Bibliográficos
Autores principales: Fergadis, Aris, Baziotis, Christos, Pappas, Dimitris, Papageorgiou, Haris, Potamianos, Alexandros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6105093/
https://www.ncbi.nlm.nih.gov/pubmed/30137284
http://dx.doi.org/10.1093/database/bay076
_version_ 1783349598722457600
author Fergadis, Aris
Baziotis, Christos
Pappas, Dimitris
Papageorgiou, Haris
Potamianos, Alexandros
author_facet Fergadis, Aris
Baziotis, Christos
Pappas, Dimitris
Papageorgiou, Haris
Potamianos, Alexandros
author_sort Fergadis, Aris
collection PubMed
description In this paper, we describe a hierarchical bi-directional attention-based Re-current Neural Network (RNN) as a reusable sequence encoder architecture, which is used as sentence and document encoder for document classification. The sequence encoder is composed of two bi-directional RNN equipped with an attention mechanism that identifies and captures the most important elements, words or sentences, in a document followed by a dense layer for the classification task. Our approach utilizes the hierarchical nature of documents which are composed of sequences of sentences and sentences are composed of sequences of words. In our model, we use word embeddings to project the words to a low-dimensional vector space. We leverage word embeddings trained on PubMed for initializing the embedding layer of our network. We apply this model to biomedical literature specifically, on paper abstracts published in PubMed. We argue that the title of the paper itself usually contains important information more salient than a typical sentence in the abstract. For this reason, we propose a shortcut connection that integrates the title vector representation directly to the final feature representation of the document. We concatenate the sentence vector that represents the title and the vectors of the abstract to the document feature vector used as input to the task classifier. With this system we participated in the Document Triage Task of the BioCreative VI Precision Medicine Track and we achieved 0.6289 Precision, 0.7656 Recall and 0.6906 F1-score with the Precision and F1-score be the highest ranking first among the other systems. Database URL: https://github.com/afergadis/BC6PM-HRNN
format Online
Article
Text
id pubmed-6105093
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61050932018-08-27 Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations Fergadis, Aris Baziotis, Christos Pappas, Dimitris Papageorgiou, Haris Potamianos, Alexandros Database (Oxford) Original Article In this paper, we describe a hierarchical bi-directional attention-based Re-current Neural Network (RNN) as a reusable sequence encoder architecture, which is used as sentence and document encoder for document classification. The sequence encoder is composed of two bi-directional RNN equipped with an attention mechanism that identifies and captures the most important elements, words or sentences, in a document followed by a dense layer for the classification task. Our approach utilizes the hierarchical nature of documents which are composed of sequences of sentences and sentences are composed of sequences of words. In our model, we use word embeddings to project the words to a low-dimensional vector space. We leverage word embeddings trained on PubMed for initializing the embedding layer of our network. We apply this model to biomedical literature specifically, on paper abstracts published in PubMed. We argue that the title of the paper itself usually contains important information more salient than a typical sentence in the abstract. For this reason, we propose a shortcut connection that integrates the title vector representation directly to the final feature representation of the document. We concatenate the sentence vector that represents the title and the vectors of the abstract to the document feature vector used as input to the task classifier. With this system we participated in the Document Triage Task of the BioCreative VI Precision Medicine Track and we achieved 0.6289 Precision, 0.7656 Recall and 0.6906 F1-score with the Precision and F1-score be the highest ranking first among the other systems. Database URL: https://github.com/afergadis/BC6PM-HRNN Oxford University Press 2018-08-21 /pmc/articles/PMC6105093/ /pubmed/30137284 http://dx.doi.org/10.1093/database/bay076 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Fergadis, Aris
Baziotis, Christos
Pappas, Dimitris
Papageorgiou, Haris
Potamianos, Alexandros
Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations
title Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations
title_full Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations
title_fullStr Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations
title_full_unstemmed Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations
title_short Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations
title_sort hierarchical bi-directional attention-based rnns for supporting document classification on protein–protein interactions affected by genetic mutations
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6105093/
https://www.ncbi.nlm.nih.gov/pubmed/30137284
http://dx.doi.org/10.1093/database/bay076
work_keys_str_mv AT fergadisaris hierarchicalbidirectionalattentionbasedrnnsforsupportingdocumentclassificationonproteinproteininteractionsaffectedbygeneticmutations
AT baziotischristos hierarchicalbidirectionalattentionbasedrnnsforsupportingdocumentclassificationonproteinproteininteractionsaffectedbygeneticmutations
AT pappasdimitris hierarchicalbidirectionalattentionbasedrnnsforsupportingdocumentclassificationonproteinproteininteractionsaffectedbygeneticmutations
AT papageorgiouharis hierarchicalbidirectionalattentionbasedrnnsforsupportingdocumentclassificationonproteinproteininteractionsaffectedbygeneticmutations
AT potamianosalexandros hierarchicalbidirectionalattentionbasedrnnsforsupportingdocumentclassificationonproteinproteininteractionsaffectedbygeneticmutations