Cargando…

A self-training subspace clustering algorithm based on adaptive confidence for gene expression data

Gene clustering is one of the important techniques to identify co-expressed gene groups from gene expression data, which provides a powerful tool for investigating functional relationships of genes in biological process. Self-training is a kind of important semi-supervised learning method and has ex...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Dan, Liang, Hongnan, Qin, Pan, Wang, Jia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10070828/ https://www.ncbi.nlm.nih.gov/pubmed/37025450 http://dx.doi.org/10.3389/fgene.2023.1132370

_version_	1785019076187258880
author	Li, Dan Liang, Hongnan Qin, Pan Wang, Jia
author_facet	Li, Dan Liang, Hongnan Qin, Pan Wang, Jia
author_sort	Li, Dan
collection	PubMed
description	Gene clustering is one of the important techniques to identify co-expressed gene groups from gene expression data, which provides a powerful tool for investigating functional relationships of genes in biological process. Self-training is a kind of important semi-supervised learning method and has exhibited good performance on gene clustering problem. However, the self-training process inevitably suffers from mislabeling, the accumulation of which will lead to the degradation of semi-supervised learning performance of gene expression data. To solve the problem, this paper proposes a self-training subspace clustering algorithm based on adaptive confidence for gene expression data (SSCAC), which combines the low-rank representation of gene expression data and adaptive adjustment of label confidence to better guide the partition of unlabeled data. The superiority of the proposed SSCAC algorithm is mainly reflected in the following aspects. 1) In order to improve the discriminative property of gene expression data, the low-rank representation with distance penalty is used to mine the potential subspace structure of data. 2) Considering the problem of mislabeling in self-training, a semi-supervised clustering objective function with label confidence is proposed, and a self-training subspace clustering framework is constructed on this basis. 3) In order to mitigate the negative impact of mislabeled data, an adaptive adjustment strategy based on gravitational search algorithm is proposed for label confidence. Compared with a variety of state-of-the-art unsupervised and semi-supervised learning algorithms, the SSCAC algorithm has demonstrated its superiority through extensive experiments on two benchmark gene expression datasets.
format	Online Article Text
id	pubmed-10070828
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-100708282023-04-05 A self-training subspace clustering algorithm based on adaptive confidence for gene expression data Li, Dan Liang, Hongnan Qin, Pan Wang, Jia Front Genet Genetics Gene clustering is one of the important techniques to identify co-expressed gene groups from gene expression data, which provides a powerful tool for investigating functional relationships of genes in biological process. Self-training is a kind of important semi-supervised learning method and has exhibited good performance on gene clustering problem. However, the self-training process inevitably suffers from mislabeling, the accumulation of which will lead to the degradation of semi-supervised learning performance of gene expression data. To solve the problem, this paper proposes a self-training subspace clustering algorithm based on adaptive confidence for gene expression data (SSCAC), which combines the low-rank representation of gene expression data and adaptive adjustment of label confidence to better guide the partition of unlabeled data. The superiority of the proposed SSCAC algorithm is mainly reflected in the following aspects. 1) In order to improve the discriminative property of gene expression data, the low-rank representation with distance penalty is used to mine the potential subspace structure of data. 2) Considering the problem of mislabeling in self-training, a semi-supervised clustering objective function with label confidence is proposed, and a self-training subspace clustering framework is constructed on this basis. 3) In order to mitigate the negative impact of mislabeled data, an adaptive adjustment strategy based on gravitational search algorithm is proposed for label confidence. Compared with a variety of state-of-the-art unsupervised and semi-supervised learning algorithms, the SSCAC algorithm has demonstrated its superiority through extensive experiments on two benchmark gene expression datasets. Frontiers Media S.A. 2023-03-21 /pmc/articles/PMC10070828/ /pubmed/37025450 http://dx.doi.org/10.3389/fgene.2023.1132370 Text en Copyright © 2023 Li, Liang, Qin and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Li, Dan Liang, Hongnan Qin, Pan Wang, Jia A self-training subspace clustering algorithm based on adaptive confidence for gene expression data
title	A self-training subspace clustering algorithm based on adaptive confidence for gene expression data
title_full	A self-training subspace clustering algorithm based on adaptive confidence for gene expression data
title_fullStr	A self-training subspace clustering algorithm based on adaptive confidence for gene expression data
title_full_unstemmed	A self-training subspace clustering algorithm based on adaptive confidence for gene expression data
title_short	A self-training subspace clustering algorithm based on adaptive confidence for gene expression data
title_sort	self-training subspace clustering algorithm based on adaptive confidence for gene expression data
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10070828/ https://www.ncbi.nlm.nih.gov/pubmed/37025450 http://dx.doi.org/10.3389/fgene.2023.1132370
work_keys_str_mv	AT lidan aselftrainingsubspaceclusteringalgorithmbasedonadaptiveconfidenceforgeneexpressiondata AT lianghongnan aselftrainingsubspaceclusteringalgorithmbasedonadaptiveconfidenceforgeneexpressiondata AT qinpan aselftrainingsubspaceclusteringalgorithmbasedonadaptiveconfidenceforgeneexpressiondata AT wangjia aselftrainingsubspaceclusteringalgorithmbasedonadaptiveconfidenceforgeneexpressiondata AT lidan selftrainingsubspaceclusteringalgorithmbasedonadaptiveconfidenceforgeneexpressiondata AT lianghongnan selftrainingsubspaceclusteringalgorithmbasedonadaptiveconfidenceforgeneexpressiondata AT qinpan selftrainingsubspaceclusteringalgorithmbasedonadaptiveconfidenceforgeneexpressiondata AT wangjia selftrainingsubspaceclusteringalgorithmbasedonadaptiveconfidenceforgeneexpressiondata

A self-training subspace clustering algorithm based on adaptive confidence for gene expression data

Ejemplares similares