Cargando…

Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme

One of the major aspects affecting the performance of the classification algorithms is the amount of labeled data which is available during the training phase. It is widely accepted that the labeling procedure of vast amounts of data is both expensive and time-consuming since it requires the employm...

Descripción completa

Detalles Bibliográficos
Autores principales: Fazakis, Nikos, Kanas, Vasileios G., Aridas, Christos K., Karlos, Stamatis, Kotsiantis, Sotiris
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514320/
http://dx.doi.org/10.3390/e21100988
_version_ 1783586560889847808
author Fazakis, Nikos
Kanas, Vasileios G.
Aridas, Christos K.
Karlos, Stamatis
Kotsiantis, Sotiris
author_facet Fazakis, Nikos
Kanas, Vasileios G.
Aridas, Christos K.
Karlos, Stamatis
Kotsiantis, Sotiris
author_sort Fazakis, Nikos
collection PubMed
description One of the major aspects affecting the performance of the classification algorithms is the amount of labeled data which is available during the training phase. It is widely accepted that the labeling procedure of vast amounts of data is both expensive and time-consuming since it requires the employment of human expertise. For a wide variety of scientific fields, unlabeled examples are easy to collect but hard to handle in a useful manner, thus improving the contained information for a subject dataset. In this context, a variety of learning methods have been studied in the literature aiming to efficiently utilize the vast amounts of unlabeled data during the learning process. The most common approaches tackle problems of this kind by individually applying active learning or semi-supervised learning methods. In this work, a combination of active learning and semi-supervised learning methods is proposed, under a common self-training scheme, in order to efficiently utilize the available unlabeled data. The effective and robust metrics of the entropy and the distribution of probabilities of the unlabeled set, to select the most sufficient unlabeled examples for the augmentation of the initial labeled set, are used. The superiority of the proposed scheme is validated by comparing it against the base approaches of supervised, semi-supervised, and active learning in the wide range of fifty-five benchmark datasets.
format Online
Article
Text
id pubmed-7514320
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75143202020-11-09 Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme Fazakis, Nikos Kanas, Vasileios G. Aridas, Christos K. Karlos, Stamatis Kotsiantis, Sotiris Entropy (Basel) Article One of the major aspects affecting the performance of the classification algorithms is the amount of labeled data which is available during the training phase. It is widely accepted that the labeling procedure of vast amounts of data is both expensive and time-consuming since it requires the employment of human expertise. For a wide variety of scientific fields, unlabeled examples are easy to collect but hard to handle in a useful manner, thus improving the contained information for a subject dataset. In this context, a variety of learning methods have been studied in the literature aiming to efficiently utilize the vast amounts of unlabeled data during the learning process. The most common approaches tackle problems of this kind by individually applying active learning or semi-supervised learning methods. In this work, a combination of active learning and semi-supervised learning methods is proposed, under a common self-training scheme, in order to efficiently utilize the available unlabeled data. The effective and robust metrics of the entropy and the distribution of probabilities of the unlabeled set, to select the most sufficient unlabeled examples for the augmentation of the initial labeled set, are used. The superiority of the proposed scheme is validated by comparing it against the base approaches of supervised, semi-supervised, and active learning in the wide range of fifty-five benchmark datasets. MDPI 2019-10-10 /pmc/articles/PMC7514320/ http://dx.doi.org/10.3390/e21100988 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Fazakis, Nikos
Kanas, Vasileios G.
Aridas, Christos K.
Karlos, Stamatis
Kotsiantis, Sotiris
Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme
title Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme
title_full Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme
title_fullStr Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme
title_full_unstemmed Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme
title_short Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme
title_sort combination of active learning and semi-supervised learning under a self-training scheme
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514320/
http://dx.doi.org/10.3390/e21100988
work_keys_str_mv AT fazakisnikos combinationofactivelearningandsemisupervisedlearningunderaselftrainingscheme
AT kanasvasileiosg combinationofactivelearningandsemisupervisedlearningunderaselftrainingscheme
AT aridaschristosk combinationofactivelearningandsemisupervisedlearningunderaselftrainingscheme
AT karlosstamatis combinationofactivelearningandsemisupervisedlearningunderaselftrainingscheme
AT kotsiantissotiris combinationofactivelearningandsemisupervisedlearningunderaselftrainingscheme