Cargando…
A novel logistic regression model combining semi-supervised learning and active learning for disease classification
Traditional supervised learning classifier needs a lot of labeled samples to achieve good performance, however in many biological datasets there is only a small size of labeled samples and the remaining samples are unlabeled. Labeling these unlabeled samples manually is difficult or expensive. Techn...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6115447/ https://www.ncbi.nlm.nih.gov/pubmed/30158596 http://dx.doi.org/10.1038/s41598-018-31395-5 |
_version_ | 1783351386203750400 |
---|---|
author | Chai, Hua Liang, Yong Wang, Sai Shen, Hai-wei |
author_facet | Chai, Hua Liang, Yong Wang, Sai Shen, Hai-wei |
author_sort | Chai, Hua |
collection | PubMed |
description | Traditional supervised learning classifier needs a lot of labeled samples to achieve good performance, however in many biological datasets there is only a small size of labeled samples and the remaining samples are unlabeled. Labeling these unlabeled samples manually is difficult or expensive. Technologies such as active learning and semi-supervised learning have been proposed to utilize the unlabeled samples for improving the model performance. However in active learning the model suffers from being short-sighted or biased and some manual workload is still needed. The semi-supervised learning methods are easy to be affected by the noisy samples. In this paper we propose a novel logistic regression model based on complementarity of active learning and semi-supervised learning, for utilizing the unlabeled samples with least cost to improve the disease classification accuracy. In addition to that, an update pseudo-labeled samples mechanism is designed to reduce the false pseudo-labeled samples. The experiment results show that this new model can achieve better performances compared the widely used semi-supervised learning and active learning methods in disease classification and gene selection. |
format | Online Article Text |
id | pubmed-6115447 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-61154472018-09-04 A novel logistic regression model combining semi-supervised learning and active learning for disease classification Chai, Hua Liang, Yong Wang, Sai Shen, Hai-wei Sci Rep Article Traditional supervised learning classifier needs a lot of labeled samples to achieve good performance, however in many biological datasets there is only a small size of labeled samples and the remaining samples are unlabeled. Labeling these unlabeled samples manually is difficult or expensive. Technologies such as active learning and semi-supervised learning have been proposed to utilize the unlabeled samples for improving the model performance. However in active learning the model suffers from being short-sighted or biased and some manual workload is still needed. The semi-supervised learning methods are easy to be affected by the noisy samples. In this paper we propose a novel logistic regression model based on complementarity of active learning and semi-supervised learning, for utilizing the unlabeled samples with least cost to improve the disease classification accuracy. In addition to that, an update pseudo-labeled samples mechanism is designed to reduce the false pseudo-labeled samples. The experiment results show that this new model can achieve better performances compared the widely used semi-supervised learning and active learning methods in disease classification and gene selection. Nature Publishing Group UK 2018-08-29 /pmc/articles/PMC6115447/ /pubmed/30158596 http://dx.doi.org/10.1038/s41598-018-31395-5 Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Chai, Hua Liang, Yong Wang, Sai Shen, Hai-wei A novel logistic regression model combining semi-supervised learning and active learning for disease classification |
title | A novel logistic regression model combining semi-supervised learning and active learning for disease classification |
title_full | A novel logistic regression model combining semi-supervised learning and active learning for disease classification |
title_fullStr | A novel logistic regression model combining semi-supervised learning and active learning for disease classification |
title_full_unstemmed | A novel logistic regression model combining semi-supervised learning and active learning for disease classification |
title_short | A novel logistic regression model combining semi-supervised learning and active learning for disease classification |
title_sort | novel logistic regression model combining semi-supervised learning and active learning for disease classification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6115447/ https://www.ncbi.nlm.nih.gov/pubmed/30158596 http://dx.doi.org/10.1038/s41598-018-31395-5 |
work_keys_str_mv | AT chaihua anovellogisticregressionmodelcombiningsemisupervisedlearningandactivelearningfordiseaseclassification AT liangyong anovellogisticregressionmodelcombiningsemisupervisedlearningandactivelearningfordiseaseclassification AT wangsai anovellogisticregressionmodelcombiningsemisupervisedlearningandactivelearningfordiseaseclassification AT shenhaiwei anovellogisticregressionmodelcombiningsemisupervisedlearningandactivelearningfordiseaseclassification AT chaihua novellogisticregressionmodelcombiningsemisupervisedlearningandactivelearningfordiseaseclassification AT liangyong novellogisticregressionmodelcombiningsemisupervisedlearningandactivelearningfordiseaseclassification AT wangsai novellogisticregressionmodelcombiningsemisupervisedlearningandactivelearningfordiseaseclassification AT shenhaiwei novellogisticregressionmodelcombiningsemisupervisedlearningandactivelearningfordiseaseclassification |