Cargando…

Active semi-supervised learning for biological data classification

Due to datasets have continuously grown, efforts have been performed in the attempt to solve the problem related to the large amount of unlabeled data in disproportion to the scarcity of labeled data. Another important issue is related to the trade-off between the difficulty in obtaining annotations...

Descripción completa

Detalles Bibliográficos
Autores principales: Camargo, Guilherme, Bugatti, Pedro H., Saito, Priscila T. M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7437865/
https://www.ncbi.nlm.nih.gov/pubmed/32813738
http://dx.doi.org/10.1371/journal.pone.0237428
_version_ 1783572705711226880
author Camargo, Guilherme
Bugatti, Pedro H.
Saito, Priscila T. M.
author_facet Camargo, Guilherme
Bugatti, Pedro H.
Saito, Priscila T. M.
author_sort Camargo, Guilherme
collection PubMed
description Due to datasets have continuously grown, efforts have been performed in the attempt to solve the problem related to the large amount of unlabeled data in disproportion to the scarcity of labeled data. Another important issue is related to the trade-off between the difficulty in obtaining annotations provided by a specialist and the need for a significant amount of annotated data to obtain a robust classifier. In this context, active learning techniques jointly with semi-supervised learning are interesting. A smaller number of more informative samples previously selected (by the active learning strategy) and labeled by a specialist can propagate the labels to a set of unlabeled data (through the semi-supervised one). However, most of the literature works neglect the need for interactive response times that can be required by certain real applications. We propose a more effective and efficient active semi-supervised learning framework, including a new active learning method. An extensive experimental evaluation was performed in the biological context (using the ALL-AML, Escherichia coli and PlantLeaves II datasets), comparing our proposals with state-of-the-art literature works and different supervised (SVM, RF, OPF) and semi-supervised (YATSI-SVM, YATSI-RF and YATSI-OPF) classifiers. From the obtained results, we can observe the benefits of our framework, which allows the classifier to achieve higher accuracies more quickly with a reduced number of annotated samples. Moreover, the selection criterion adopted by our active learning method, based on diversity and uncertainty, enables the prioritization of the most informative boundary samples for the learning process. We obtained a gain of up to 20% against other learning techniques. The active semi-supervised learning approaches presented a better trade-off (accuracies and competitive and viable computational times) when compared with the active supervised learning ones.
format Online
Article
Text
id pubmed-7437865
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-74378652020-08-26 Active semi-supervised learning for biological data classification Camargo, Guilherme Bugatti, Pedro H. Saito, Priscila T. M. PLoS One Research Article Due to datasets have continuously grown, efforts have been performed in the attempt to solve the problem related to the large amount of unlabeled data in disproportion to the scarcity of labeled data. Another important issue is related to the trade-off between the difficulty in obtaining annotations provided by a specialist and the need for a significant amount of annotated data to obtain a robust classifier. In this context, active learning techniques jointly with semi-supervised learning are interesting. A smaller number of more informative samples previously selected (by the active learning strategy) and labeled by a specialist can propagate the labels to a set of unlabeled data (through the semi-supervised one). However, most of the literature works neglect the need for interactive response times that can be required by certain real applications. We propose a more effective and efficient active semi-supervised learning framework, including a new active learning method. An extensive experimental evaluation was performed in the biological context (using the ALL-AML, Escherichia coli and PlantLeaves II datasets), comparing our proposals with state-of-the-art literature works and different supervised (SVM, RF, OPF) and semi-supervised (YATSI-SVM, YATSI-RF and YATSI-OPF) classifiers. From the obtained results, we can observe the benefits of our framework, which allows the classifier to achieve higher accuracies more quickly with a reduced number of annotated samples. Moreover, the selection criterion adopted by our active learning method, based on diversity and uncertainty, enables the prioritization of the most informative boundary samples for the learning process. We obtained a gain of up to 20% against other learning techniques. The active semi-supervised learning approaches presented a better trade-off (accuracies and competitive and viable computational times) when compared with the active supervised learning ones. Public Library of Science 2020-08-19 /pmc/articles/PMC7437865/ /pubmed/32813738 http://dx.doi.org/10.1371/journal.pone.0237428 Text en © 2020 Camargo et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Camargo, Guilherme
Bugatti, Pedro H.
Saito, Priscila T. M.
Active semi-supervised learning for biological data classification
title Active semi-supervised learning for biological data classification
title_full Active semi-supervised learning for biological data classification
title_fullStr Active semi-supervised learning for biological data classification
title_full_unstemmed Active semi-supervised learning for biological data classification
title_short Active semi-supervised learning for biological data classification
title_sort active semi-supervised learning for biological data classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7437865/
https://www.ncbi.nlm.nih.gov/pubmed/32813738
http://dx.doi.org/10.1371/journal.pone.0237428
work_keys_str_mv AT camargoguilherme activesemisupervisedlearningforbiologicaldataclassification
AT bugattipedroh activesemisupervisedlearningforbiologicaldataclassification
AT saitopriscilatm activesemisupervisedlearningforbiologicaldataclassification