Cargando…
Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme
One of the major aspects affecting the performance of the classification algorithms is the amount of labeled data which is available during the training phase. It is widely accepted that the labeling procedure of vast amounts of data is both expensive and time-consuming since it requires the employm...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514320/ http://dx.doi.org/10.3390/e21100988 |
_version_ | 1783586560889847808 |
---|---|
author | Fazakis, Nikos Kanas, Vasileios G. Aridas, Christos K. Karlos, Stamatis Kotsiantis, Sotiris |
author_facet | Fazakis, Nikos Kanas, Vasileios G. Aridas, Christos K. Karlos, Stamatis Kotsiantis, Sotiris |
author_sort | Fazakis, Nikos |
collection | PubMed |
description | One of the major aspects affecting the performance of the classification algorithms is the amount of labeled data which is available during the training phase. It is widely accepted that the labeling procedure of vast amounts of data is both expensive and time-consuming since it requires the employment of human expertise. For a wide variety of scientific fields, unlabeled examples are easy to collect but hard to handle in a useful manner, thus improving the contained information for a subject dataset. In this context, a variety of learning methods have been studied in the literature aiming to efficiently utilize the vast amounts of unlabeled data during the learning process. The most common approaches tackle problems of this kind by individually applying active learning or semi-supervised learning methods. In this work, a combination of active learning and semi-supervised learning methods is proposed, under a common self-training scheme, in order to efficiently utilize the available unlabeled data. The effective and robust metrics of the entropy and the distribution of probabilities of the unlabeled set, to select the most sufficient unlabeled examples for the augmentation of the initial labeled set, are used. The superiority of the proposed scheme is validated by comparing it against the base approaches of supervised, semi-supervised, and active learning in the wide range of fifty-five benchmark datasets. |
format | Online Article Text |
id | pubmed-7514320 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75143202020-11-09 Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme Fazakis, Nikos Kanas, Vasileios G. Aridas, Christos K. Karlos, Stamatis Kotsiantis, Sotiris Entropy (Basel) Article One of the major aspects affecting the performance of the classification algorithms is the amount of labeled data which is available during the training phase. It is widely accepted that the labeling procedure of vast amounts of data is both expensive and time-consuming since it requires the employment of human expertise. For a wide variety of scientific fields, unlabeled examples are easy to collect but hard to handle in a useful manner, thus improving the contained information for a subject dataset. In this context, a variety of learning methods have been studied in the literature aiming to efficiently utilize the vast amounts of unlabeled data during the learning process. The most common approaches tackle problems of this kind by individually applying active learning or semi-supervised learning methods. In this work, a combination of active learning and semi-supervised learning methods is proposed, under a common self-training scheme, in order to efficiently utilize the available unlabeled data. The effective and robust metrics of the entropy and the distribution of probabilities of the unlabeled set, to select the most sufficient unlabeled examples for the augmentation of the initial labeled set, are used. The superiority of the proposed scheme is validated by comparing it against the base approaches of supervised, semi-supervised, and active learning in the wide range of fifty-five benchmark datasets. MDPI 2019-10-10 /pmc/articles/PMC7514320/ http://dx.doi.org/10.3390/e21100988 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Fazakis, Nikos Kanas, Vasileios G. Aridas, Christos K. Karlos, Stamatis Kotsiantis, Sotiris Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme |
title | Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme |
title_full | Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme |
title_fullStr | Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme |
title_full_unstemmed | Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme |
title_short | Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme |
title_sort | combination of active learning and semi-supervised learning under a self-training scheme |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514320/ http://dx.doi.org/10.3390/e21100988 |
work_keys_str_mv | AT fazakisnikos combinationofactivelearningandsemisupervisedlearningunderaselftrainingscheme AT kanasvasileiosg combinationofactivelearningandsemisupervisedlearningunderaselftrainingscheme AT aridaschristosk combinationofactivelearningandsemisupervisedlearningunderaselftrainingscheme AT karlosstamatis combinationofactivelearningandsemisupervisedlearningunderaselftrainingscheme AT kotsiantissotiris combinationofactivelearningandsemisupervisedlearningunderaselftrainingscheme |