Cargando…

On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms

Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of “concept” as a list of words tha...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sutton, Adam, Cristianini, Nello
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256569/ http://dx.doi.org/10.1007/978-3-030-49186-4_35

_version_	1783539939251585024
author	Sutton, Adam Cristianini, Nello
author_facet	Sutton, Adam Cristianini, Nello
author_sort	Sutton, Adam
collection	PubMed
description	Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of “concept” as a list of words that have shared semantic content. We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it. We first use this method to measure the learnability of concepts on pretrained word embeddings. We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters. We find that all embedding methods capture the semantic content of those word lists, but fastText performs better than the others.
format	Online Article Text
id	pubmed-7256569
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-72565692020-05-29 On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms Sutton, Adam Cristianini, Nello Artificial Intelligence Applications and Innovations Article Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of “concept” as a list of words that have shared semantic content. We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it. We first use this method to measure the learnability of concepts on pretrained word embeddings. We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters. We find that all embedding methods capture the semantic content of those word lists, but fastText performs better than the others. 2020-05-06 /pmc/articles/PMC7256569/ http://dx.doi.org/10.1007/978-3-030-49186-4_35 Text en © IFIP International Federation for Information Processing 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Sutton, Adam Cristianini, Nello On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title	On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title_full	On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title_fullStr	On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title_full_unstemmed	On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title_short	On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
title_sort	on the learnability of concepts: with applications to comparing word embedding algorithms
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256569/ http://dx.doi.org/10.1007/978-3-030-49186-4_35
work_keys_str_mv	AT suttonadam onthelearnabilityofconceptswithapplicationstocomparingwordembeddingalgorithms AT cristianininello onthelearnabilityofconceptswithapplicationstocomparingwordembeddingalgorithms

On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms

Ejemplares similares