Cargando…
The Spectral Underpinning of word2vec
Word2vec introduced by Mikolov et al. is a word embedding method that is widely used in natural language processing. Despite its success and frequent use, a strong theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425479/ https://www.ncbi.nlm.nih.gov/pubmed/34504892 http://dx.doi.org/10.3389/fams.2020.593406 |
_version_ | 1783749854063755264 |
---|---|
author | Jaffe, Ariel Kluger, Yuval Lindenbaum, Ofir Patsenker, Jonathan Peterfreund, Erez Steinerberger, Stefan |
author_facet | Jaffe, Ariel Kluger, Yuval Lindenbaum, Ofir Patsenker, Jonathan Peterfreund, Erez Steinerberger, Stefan |
author_sort | Jaffe, Ariel |
collection | PubMed |
description | Word2vec introduced by Mikolov et al. is a word embedding method that is widely used in natural language processing. Despite its success and frequent use, a strong theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism. |
format | Online Article Text |
id | pubmed-8425479 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-84254792021-09-08 The Spectral Underpinning of word2vec Jaffe, Ariel Kluger, Yuval Lindenbaum, Ofir Patsenker, Jonathan Peterfreund, Erez Steinerberger, Stefan Front Appl Math Stat Article Word2vec introduced by Mikolov et al. is a word embedding method that is widely used in natural language processing. Despite its success and frequent use, a strong theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism. 2020-12-03 2020-12 /pmc/articles/PMC8425479/ /pubmed/34504892 http://dx.doi.org/10.3389/fams.2020.593406 Text en https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) (https://creativecommons.org/licenses/by/4.0/) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Article Jaffe, Ariel Kluger, Yuval Lindenbaum, Ofir Patsenker, Jonathan Peterfreund, Erez Steinerberger, Stefan The Spectral Underpinning of word2vec |
title | The Spectral Underpinning of word2vec |
title_full | The Spectral Underpinning of word2vec |
title_fullStr | The Spectral Underpinning of word2vec |
title_full_unstemmed | The Spectral Underpinning of word2vec |
title_short | The Spectral Underpinning of word2vec |
title_sort | spectral underpinning of word2vec |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425479/ https://www.ncbi.nlm.nih.gov/pubmed/34504892 http://dx.doi.org/10.3389/fams.2020.593406 |
work_keys_str_mv | AT jaffeariel thespectralunderpinningofword2vec AT klugeryuval thespectralunderpinningofword2vec AT lindenbaumofir thespectralunderpinningofword2vec AT patsenkerjonathan thespectralunderpinningofword2vec AT peterfreunderez thespectralunderpinningofword2vec AT steinerbergerstefan thespectralunderpinningofword2vec AT jaffeariel spectralunderpinningofword2vec AT klugeryuval spectralunderpinningofword2vec AT lindenbaumofir spectralunderpinningofword2vec AT patsenkerjonathan spectralunderpinningofword2vec AT peterfreunderez spectralunderpinningofword2vec AT steinerbergerstefan spectralunderpinningofword2vec |