Cargando…
On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of “concept” as a list of words tha...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256569/ http://dx.doi.org/10.1007/978-3-030-49186-4_35 |
_version_ | 1783539939251585024 |
---|---|
author | Sutton, Adam Cristianini, Nello |
author_facet | Sutton, Adam Cristianini, Nello |
author_sort | Sutton, Adam |
collection | PubMed |
description | Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of “concept” as a list of words that have shared semantic content. We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it. We first use this method to measure the learnability of concepts on pretrained word embeddings. We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters. We find that all embedding methods capture the semantic content of those word lists, but fastText performs better than the others. |
format | Online Article Text |
id | pubmed-7256569 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-72565692020-05-29 On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms Sutton, Adam Cristianini, Nello Artificial Intelligence Applications and Innovations Article Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of “concept” as a list of words that have shared semantic content. We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it. We first use this method to measure the learnability of concepts on pretrained word embeddings. We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters. We find that all embedding methods capture the semantic content of those word lists, but fastText performs better than the others. 2020-05-06 /pmc/articles/PMC7256569/ http://dx.doi.org/10.1007/978-3-030-49186-4_35 Text en © IFIP International Federation for Information Processing 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Sutton, Adam Cristianini, Nello On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms |
title | On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms |
title_full | On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms |
title_fullStr | On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms |
title_full_unstemmed | On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms |
title_short | On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms |
title_sort | on the learnability of concepts: with applications to comparing word embedding algorithms |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256569/ http://dx.doi.org/10.1007/978-3-030-49186-4_35 |
work_keys_str_mv | AT suttonadam onthelearnabilityofconceptswithapplicationstocomparingwordembeddingalgorithms AT cristianininello onthelearnabilityofconceptswithapplicationstocomparingwordembeddingalgorithms |