Cargando…

MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data

Many clustering techniques have been proposed to group genes based on gene expression data. Among these methods, semi-supervised clustering techniques aim to improve clustering performance by incorporating supervisory information in the form of pairwise constraints. However, noisy constraints inevit...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Zeyuan, Gu, Hong, Zhao, Minghui, Li, Dan, Wang, Jia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10008853/
https://www.ncbi.nlm.nih.gov/pubmed/36923794
http://dx.doi.org/10.3389/fgene.2023.1135260
_version_ 1784905850902544384
author Wang, Zeyuan
Gu, Hong
Zhao, Minghui
Li, Dan
Wang, Jia
author_facet Wang, Zeyuan
Gu, Hong
Zhao, Minghui
Li, Dan
Wang, Jia
author_sort Wang, Zeyuan
collection PubMed
description Many clustering techniques have been proposed to group genes based on gene expression data. Among these methods, semi-supervised clustering techniques aim to improve clustering performance by incorporating supervisory information in the form of pairwise constraints. However, noisy constraints inevitably exist in the constraint set obtained on the practical unlabeled dataset, which degenerates the performance of semi-supervised clustering. Moreover, multiple information sources are not integrated into multi-source constraints to improve clustering quality. To this end, the research proposes a new multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints (MSC-CSMC) for unlabeled gene expression data. The proposed method first uses the gene expression data and the gene ontology (GO) that describes gene annotation information to form multi-source constraints. Then, the multi-source constraints are applied to the clustering by improving the constraint violation penalty weight in the semi-supervised clustering objective function. Furthermore, the constraints selection and cluster prototypes are put into the multi-objective evolutionary framework by adopting a mixed chromosome encoding strategy, which can select pairwise constraints suitable for clustering tasks through synergistic optimization to reduce the negative influence of noisy constraints. The proposed MSC-CSMC algorithm is testified using five benchmark gene expression datasets, and the results show that the proposed algorithm achieves superior performance.
format Online
Article
Text
id pubmed-10008853
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-100088532023-03-14 MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data Wang, Zeyuan Gu, Hong Zhao, Minghui Li, Dan Wang, Jia Front Genet Genetics Many clustering techniques have been proposed to group genes based on gene expression data. Among these methods, semi-supervised clustering techniques aim to improve clustering performance by incorporating supervisory information in the form of pairwise constraints. However, noisy constraints inevitably exist in the constraint set obtained on the practical unlabeled dataset, which degenerates the performance of semi-supervised clustering. Moreover, multiple information sources are not integrated into multi-source constraints to improve clustering quality. To this end, the research proposes a new multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints (MSC-CSMC) for unlabeled gene expression data. The proposed method first uses the gene expression data and the gene ontology (GO) that describes gene annotation information to form multi-source constraints. Then, the multi-source constraints are applied to the clustering by improving the constraint violation penalty weight in the semi-supervised clustering objective function. Furthermore, the constraints selection and cluster prototypes are put into the multi-objective evolutionary framework by adopting a mixed chromosome encoding strategy, which can select pairwise constraints suitable for clustering tasks through synergistic optimization to reduce the negative influence of noisy constraints. The proposed MSC-CSMC algorithm is testified using five benchmark gene expression datasets, and the results show that the proposed algorithm achieves superior performance. Frontiers Media S.A. 2023-02-27 /pmc/articles/PMC10008853/ /pubmed/36923794 http://dx.doi.org/10.3389/fgene.2023.1135260 Text en Copyright © 2023 Wang, Gu, Zhao, Li and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Wang, Zeyuan
Gu, Hong
Zhao, Minghui
Li, Dan
Wang, Jia
MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data
title MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data
title_full MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data
title_fullStr MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data
title_full_unstemmed MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data
title_short MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data
title_sort msc-csmc: a multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10008853/
https://www.ncbi.nlm.nih.gov/pubmed/36923794
http://dx.doi.org/10.3389/fgene.2023.1135260
work_keys_str_mv AT wangzeyuan msccsmcamultiobjectivesemisupervisedclusteringalgorithmbasedonconstraintsselectionandmultisourceconstraintsforgeneexpressiondata
AT guhong msccsmcamultiobjectivesemisupervisedclusteringalgorithmbasedonconstraintsselectionandmultisourceconstraintsforgeneexpressiondata
AT zhaominghui msccsmcamultiobjectivesemisupervisedclusteringalgorithmbasedonconstraintsselectionandmultisourceconstraintsforgeneexpressiondata
AT lidan msccsmcamultiobjectivesemisupervisedclusteringalgorithmbasedonconstraintsselectionandmultisourceconstraintsforgeneexpressiondata
AT wangjia msccsmcamultiobjectivesemisupervisedclusteringalgorithmbasedonconstraintsselectionandmultisourceconstraintsforgeneexpressiondata