Cargando…

On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms

Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of “concept” as a list of words tha...

Descripción completa

Detalles Bibliográficos
Autores principales: Sutton, Adam, Cristianini, Nello
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256569/
http://dx.doi.org/10.1007/978-3-030-49186-4_35
_version_ 1783539939251585024
author Sutton, Adam
Cristianini, Nello
author_facet Sutton, Adam
Cristianini, Nello
author_sort Sutton, Adam
collection PubMed
description Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of “concept” as a list of words that have shared semantic content. We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it. We first use this method to measure the learnability of concepts on pretrained word embeddings. We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters. We find that all embedding methods capture the semantic content of those word lists, but fastText performs better than the others.
format Online
Article
Text
id pubmed-7256569
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72565692020-05-29 On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms Sutton, Adam Cristianini, Nello Artificial Intelligence Applications and Innovations Article Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of “concept” as a list of words that have shared semantic content. We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it. We first use this method to measure the learnability of concepts on pretrained word embeddings. We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters. We find that all embedding methods capture the semantic content of those word lists, but fastText performs better than the others. 2020-05-06 /pmc/articles/PMC7256569/ http://dx.doi.org/10.1007/978-3-030-49186-4_35 Text en © IFIP International Federation for Information Processing 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Sutton, Adam
Cristianini, Nello
On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title_full On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title_fullStr On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title_full_unstemmed On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title_short On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title_sort on the learnability of concepts: with applications to comparing word embedding algorithms
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256569/
http://dx.doi.org/10.1007/978-3-030-49186-4_35
work_keys_str_mv AT suttonadam onthelearnabilityofconceptswithapplicationstocomparingwordembeddingalgorithms
AT cristianininello onthelearnabilityofconceptswithapplicationstocomparingwordembeddingalgorithms