Cargando…

The Spectral Underpinning of word2vec

Word2vec introduced by Mikolov et al. is a word embedding method that is widely used in natural language processing. Despite its success and frequent use, a strong theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear...

Descripción completa

Detalles Bibliográficos
Autores principales: Jaffe, Ariel, Kluger, Yuval, Lindenbaum, Ofir, Patsenker, Jonathan, Peterfreund, Erez, Steinerberger, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425479/
https://www.ncbi.nlm.nih.gov/pubmed/34504892
http://dx.doi.org/10.3389/fams.2020.593406
_version_ 1783749854063755264
author Jaffe, Ariel
Kluger, Yuval
Lindenbaum, Ofir
Patsenker, Jonathan
Peterfreund, Erez
Steinerberger, Stefan
author_facet Jaffe, Ariel
Kluger, Yuval
Lindenbaum, Ofir
Patsenker, Jonathan
Peterfreund, Erez
Steinerberger, Stefan
author_sort Jaffe, Ariel
collection PubMed
description Word2vec introduced by Mikolov et al. is a word embedding method that is widely used in natural language processing. Despite its success and frequent use, a strong theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.
format Online
Article
Text
id pubmed-8425479
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-84254792021-09-08 The Spectral Underpinning of word2vec Jaffe, Ariel Kluger, Yuval Lindenbaum, Ofir Patsenker, Jonathan Peterfreund, Erez Steinerberger, Stefan Front Appl Math Stat Article Word2vec introduced by Mikolov et al. is a word embedding method that is widely used in natural language processing. Despite its success and frequent use, a strong theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism. 2020-12-03 2020-12 /pmc/articles/PMC8425479/ /pubmed/34504892 http://dx.doi.org/10.3389/fams.2020.593406 Text en https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) (https://creativecommons.org/licenses/by/4.0/) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Article
Jaffe, Ariel
Kluger, Yuval
Lindenbaum, Ofir
Patsenker, Jonathan
Peterfreund, Erez
Steinerberger, Stefan
The Spectral Underpinning of word2vec
title The Spectral Underpinning of word2vec
title_full The Spectral Underpinning of word2vec
title_fullStr The Spectral Underpinning of word2vec
title_full_unstemmed The Spectral Underpinning of word2vec
title_short The Spectral Underpinning of word2vec
title_sort spectral underpinning of word2vec
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425479/
https://www.ncbi.nlm.nih.gov/pubmed/34504892
http://dx.doi.org/10.3389/fams.2020.593406
work_keys_str_mv AT jaffeariel thespectralunderpinningofword2vec
AT klugeryuval thespectralunderpinningofword2vec
AT lindenbaumofir thespectralunderpinningofword2vec
AT patsenkerjonathan thespectralunderpinningofword2vec
AT peterfreunderez thespectralunderpinningofword2vec
AT steinerbergerstefan thespectralunderpinningofword2vec
AT jaffeariel spectralunderpinningofword2vec
AT klugeryuval spectralunderpinningofword2vec
AT lindenbaumofir spectralunderpinningofword2vec
AT patsenkerjonathan spectralunderpinningofword2vec
AT peterfreunderez spectralunderpinningofword2vec
AT steinerbergerstefan spectralunderpinningofword2vec