Cargando…

Jointly learning word embeddings using a corpus and a knowledge base

Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structur...

Descripción completa

Detalles Bibliográficos
Autores principales: Alsuhaibani, Mohammed, Bollegala, Danushka, Maehara, Takanori, Kawarabayashi, Ken-ichi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5847320/
https://www.ncbi.nlm.nih.gov/pubmed/29529052
http://dx.doi.org/10.1371/journal.pone.0193094
_version_ 1783305726498701312
author Alsuhaibani, Mohammed
Bollegala, Danushka
Maehara, Takanori
Kawarabayashi, Ken-ichi
author_facet Alsuhaibani, Mohammed
Bollegala, Danushka
Maehara, Takanori
Kawarabayashi, Ken-ichi
author_sort Alsuhaibani, Mohammed
collection PubMed
description Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks.
format Online
Article
Text
id pubmed-5847320
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-58473202018-03-23 Jointly learning word embeddings using a corpus and a knowledge base Alsuhaibani, Mohammed Bollegala, Danushka Maehara, Takanori Kawarabayashi, Ken-ichi PLoS One Research Article Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks. Public Library of Science 2018-03-12 /pmc/articles/PMC5847320/ /pubmed/29529052 http://dx.doi.org/10.1371/journal.pone.0193094 Text en © 2018 Alsuhaibani et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Alsuhaibani, Mohammed
Bollegala, Danushka
Maehara, Takanori
Kawarabayashi, Ken-ichi
Jointly learning word embeddings using a corpus and a knowledge base
title Jointly learning word embeddings using a corpus and a knowledge base
title_full Jointly learning word embeddings using a corpus and a knowledge base
title_fullStr Jointly learning word embeddings using a corpus and a knowledge base
title_full_unstemmed Jointly learning word embeddings using a corpus and a knowledge base
title_short Jointly learning word embeddings using a corpus and a knowledge base
title_sort jointly learning word embeddings using a corpus and a knowledge base
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5847320/
https://www.ncbi.nlm.nih.gov/pubmed/29529052
http://dx.doi.org/10.1371/journal.pone.0193094
work_keys_str_mv AT alsuhaibanimohammed jointlylearningwordembeddingsusingacorpusandaknowledgebase
AT bollegaladanushka jointlylearningwordembeddingsusingacorpusandaknowledgebase
AT maeharatakanori jointlylearningwordembeddingsusingacorpusandaknowledgebase
AT kawarabayashikenichi jointlylearningwordembeddingsusingacorpusandaknowledgebase