Cargando…

Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation

Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon...

Descripción completa

Detalles Bibliográficos
Autores principales: Wei, Chao, Luo, Senlin, Ma, Xincheng, Ren, Hao, Zhang, Ji, Pan, Limin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4718658/
https://www.ncbi.nlm.nih.gov/pubmed/26784692
http://dx.doi.org/10.1371/journal.pone.0146672
_version_ 1782410835328499712
author Wei, Chao
Luo, Senlin
Ma, Xincheng
Ren, Hao
Zhang, Ji
Pan, Limin
author_facet Wei, Chao
Luo, Senlin
Ma, Xincheng
Ren, Hao
Zhang, Ji
Pan, Limin
author_sort Wei, Chao
collection PubMed
description Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.
format Online
Article
Text
id pubmed-4718658
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-47186582016-01-30 Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation Wei, Chao Luo, Senlin Ma, Xincheng Ren, Hao Zhang, Ji Pan, Limin PLoS One Research Article Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation. Public Library of Science 2016-01-19 /pmc/articles/PMC4718658/ /pubmed/26784692 http://dx.doi.org/10.1371/journal.pone.0146672 Text en © 2016 Wei et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wei, Chao
Luo, Senlin
Ma, Xincheng
Ren, Hao
Zhang, Ji
Pan, Limin
Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation
title Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation
title_full Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation
title_fullStr Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation
title_full_unstemmed Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation
title_short Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation
title_sort locally embedding autoencoders: a semi-supervised manifold learning approach of document representation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4718658/
https://www.ncbi.nlm.nih.gov/pubmed/26784692
http://dx.doi.org/10.1371/journal.pone.0146672
work_keys_str_mv AT weichao locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation
AT luosenlin locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation
AT maxincheng locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation
AT renhao locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation
AT zhangji locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation
AT panlimin locallyembeddingautoencodersasemisupervisedmanifoldlearningapproachofdocumentrepresentation