Cargando…

Deep Low-Density Separation for Semi-supervised Classification

Given a small set of labeled data and a large set of unlabeled data, semi-supervised learning (ssl) attempts to leverage the location of the unlabeled datapoints in order to create a better classifier than could be obtained from supervised methods applied to the labeled training set alone. Effective...

Descripción completa

Detalles Bibliográficos
Autores principales: Burkhart, Michael C., Shan, Kyle
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304056/
http://dx.doi.org/10.1007/978-3-030-50420-5_22
_version_ 1783548189071114240
author Burkhart, Michael C.
Shan, Kyle
author_facet Burkhart, Michael C.
Shan, Kyle
author_sort Burkhart, Michael C.
collection PubMed
description Given a small set of labeled data and a large set of unlabeled data, semi-supervised learning (ssl) attempts to leverage the location of the unlabeled datapoints in order to create a better classifier than could be obtained from supervised methods applied to the labeled training set alone. Effective ssl imposes structural assumptions on the data, e.g. that neighbors are more likely to share a classification or that the decision boundary lies in an area of low density. For complex and high-dimensional data, neural networks can learn feature embeddings to which traditional ssl methods can then be applied in what we call hybrid methods. Previously-developed hybrid methods iterate between refining a latent representation and performing graph-based ssl on this representation. In this paper, we introduce a novel hybrid method that instead applies low-density separation to the embedded features. We describe it in detail and discuss why low-density separation may better suited for ssl on neural network-based embeddings than graph-based algorithms. We validate our method using in-house customer survey data and compare it to other state-of-the-art learning methods. Our approach effectively classifies thousands of unlabeled users from a relatively small number of hand-classified examples.
format Online
Article
Text
id pubmed-7304056
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73040562020-06-19 Deep Low-Density Separation for Semi-supervised Classification Burkhart, Michael C. Shan, Kyle Computational Science – ICCS 2020 Article Given a small set of labeled data and a large set of unlabeled data, semi-supervised learning (ssl) attempts to leverage the location of the unlabeled datapoints in order to create a better classifier than could be obtained from supervised methods applied to the labeled training set alone. Effective ssl imposes structural assumptions on the data, e.g. that neighbors are more likely to share a classification or that the decision boundary lies in an area of low density. For complex and high-dimensional data, neural networks can learn feature embeddings to which traditional ssl methods can then be applied in what we call hybrid methods. Previously-developed hybrid methods iterate between refining a latent representation and performing graph-based ssl on this representation. In this paper, we introduce a novel hybrid method that instead applies low-density separation to the embedded features. We describe it in detail and discuss why low-density separation may better suited for ssl on neural network-based embeddings than graph-based algorithms. We validate our method using in-house customer survey data and compare it to other state-of-the-art learning methods. Our approach effectively classifies thousands of unlabeled users from a relatively small number of hand-classified examples. 2020-05-22 /pmc/articles/PMC7304056/ http://dx.doi.org/10.1007/978-3-030-50420-5_22 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Burkhart, Michael C.
Shan, Kyle
Deep Low-Density Separation for Semi-supervised Classification
title Deep Low-Density Separation for Semi-supervised Classification
title_full Deep Low-Density Separation for Semi-supervised Classification
title_fullStr Deep Low-Density Separation for Semi-supervised Classification
title_full_unstemmed Deep Low-Density Separation for Semi-supervised Classification
title_short Deep Low-Density Separation for Semi-supervised Classification
title_sort deep low-density separation for semi-supervised classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304056/
http://dx.doi.org/10.1007/978-3-030-50420-5_22
work_keys_str_mv AT burkhartmichaelc deeplowdensityseparationforsemisupervisedclassification
AT shankyle deeplowdensityseparationforsemisupervisedclassification