Cargando…

Jointly learning word embeddings using a corpus and a knowledge base

Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structur...

Descripción completa

Detalles Bibliográficos
Autores principales:	Alsuhaibani, Mohammed, Bollegala, Danushka, Maehara, Takanori, Kawarabayashi, Ken-ichi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5847320/ https://www.ncbi.nlm.nih.gov/pubmed/29529052 http://dx.doi.org/10.1371/journal.pone.0193094

_version_	1783305726498701312
author	Alsuhaibani, Mohammed Bollegala, Danushka Maehara, Takanori Kawarabayashi, Ken-ichi
author_facet	Alsuhaibani, Mohammed Bollegala, Danushka Maehara, Takanori Kawarabayashi, Ken-ichi
author_sort	Alsuhaibani, Mohammed
collection	PubMed
description	Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks.
format	Online Article Text
id	pubmed-5847320
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-58473202018-03-23 Jointly learning word embeddings using a corpus and a knowledge base Alsuhaibani, Mohammed Bollegala, Danushka Maehara, Takanori Kawarabayashi, Ken-ichi PLoS One Research Article Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks. Public Library of Science 2018-03-12 /pmc/articles/PMC5847320/ /pubmed/29529052 http://dx.doi.org/10.1371/journal.pone.0193094 Text en © 2018 Alsuhaibani et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Alsuhaibani, Mohammed Bollegala, Danushka Maehara, Takanori Kawarabayashi, Ken-ichi Jointly learning word embeddings using a corpus and a knowledge base
title	Jointly learning word embeddings using a corpus and a knowledge base
title_full	Jointly learning word embeddings using a corpus and a knowledge base
title_fullStr	Jointly learning word embeddings using a corpus and a knowledge base
title_full_unstemmed	Jointly learning word embeddings using a corpus and a knowledge base
title_short	Jointly learning word embeddings using a corpus and a knowledge base
title_sort	jointly learning word embeddings using a corpus and a knowledge base
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5847320/ https://www.ncbi.nlm.nih.gov/pubmed/29529052 http://dx.doi.org/10.1371/journal.pone.0193094
work_keys_str_mv	AT alsuhaibanimohammed jointlylearningwordembeddingsusingacorpusandaknowledgebase AT bollegaladanushka jointlylearningwordembeddingsusingacorpusandaknowledgebase AT maeharatakanori jointlylearningwordembeddingsusingacorpusandaknowledgebase AT kawarabayashikenichi jointlylearningwordembeddingsusingacorpusandaknowledgebase

Jointly learning word embeddings using a corpus and a knowledge base

Ejemplares similares