Cargando…

Joint Lp-Norm and L(2,1)-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery

The dimensionality reduction method accompanied by different norm constraints plays an important role in mining useful information from large-scale gene expression data. In this article, a novel method named Lp-norm and L(2,1)-norm constrained graph Laplacian principal component analysis (PL21GPCA)...

Descripción completa

Detalles Bibliográficos
Autores principales: Kong, Xiang-Zhen, Song, Yu, Liu, Jin-Xing, Zheng, Chun-Hou, Yuan, Sha-Sha, Wang, Juan, Dai, Ling-Yun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7940841/
https://www.ncbi.nlm.nih.gov/pubmed/33708239
http://dx.doi.org/10.3389/fgene.2021.621317
_version_ 1783662028107284480
author Kong, Xiang-Zhen
Song, Yu
Liu, Jin-Xing
Zheng, Chun-Hou
Yuan, Sha-Sha
Wang, Juan
Dai, Ling-Yun
author_facet Kong, Xiang-Zhen
Song, Yu
Liu, Jin-Xing
Zheng, Chun-Hou
Yuan, Sha-Sha
Wang, Juan
Dai, Ling-Yun
author_sort Kong, Xiang-Zhen
collection PubMed
description The dimensionality reduction method accompanied by different norm constraints plays an important role in mining useful information from large-scale gene expression data. In this article, a novel method named Lp-norm and L(2,1)-norm constrained graph Laplacian principal component analysis (PL21GPCA) based on traditional principal component analysis (PCA) is proposed for robust tumor sample clustering and gene network module discovery. Three aspects are highlighted in the PL21GPCA method. First, to degrade the high sensitivity to outliers and noise, the non-convex proximal Lp-norm (0 < p < 1)constraint is applied on the loss function. Second, to enhance the sparsity of gene expression in cancer samples, the L(2),(1)-norm constraint is used on one of the regularization terms. Third, to retain the geometric structure of the data, we introduce the graph Laplacian regularization item to the PL21GPCA optimization model. Extensive experiments on five gene expression datasets, including one benchmark dataset, two single-cancer datasets from The Cancer Genome Atlas (TCGA), and two integrated datasets of multiple cancers from TCGA, are performed to validate the effectiveness of our method. The experimental results demonstrate that the PL21GPCA method performs better than many other methods in terms of tumor sample clustering. Additionally, this method is used to discover the gene network modules for the purpose of finding key genes that may be associated with some cancers.
format Online
Article
Text
id pubmed-7940841
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-79408412021-03-10 Joint Lp-Norm and L(2,1)-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery Kong, Xiang-Zhen Song, Yu Liu, Jin-Xing Zheng, Chun-Hou Yuan, Sha-Sha Wang, Juan Dai, Ling-Yun Front Genet Genetics The dimensionality reduction method accompanied by different norm constraints plays an important role in mining useful information from large-scale gene expression data. In this article, a novel method named Lp-norm and L(2,1)-norm constrained graph Laplacian principal component analysis (PL21GPCA) based on traditional principal component analysis (PCA) is proposed for robust tumor sample clustering and gene network module discovery. Three aspects are highlighted in the PL21GPCA method. First, to degrade the high sensitivity to outliers and noise, the non-convex proximal Lp-norm (0 < p < 1)constraint is applied on the loss function. Second, to enhance the sparsity of gene expression in cancer samples, the L(2),(1)-norm constraint is used on one of the regularization terms. Third, to retain the geometric structure of the data, we introduce the graph Laplacian regularization item to the PL21GPCA optimization model. Extensive experiments on five gene expression datasets, including one benchmark dataset, two single-cancer datasets from The Cancer Genome Atlas (TCGA), and two integrated datasets of multiple cancers from TCGA, are performed to validate the effectiveness of our method. The experimental results demonstrate that the PL21GPCA method performs better than many other methods in terms of tumor sample clustering. Additionally, this method is used to discover the gene network modules for the purpose of finding key genes that may be associated with some cancers. Frontiers Media S.A. 2021-02-23 /pmc/articles/PMC7940841/ /pubmed/33708239 http://dx.doi.org/10.3389/fgene.2021.621317 Text en Copyright © 2021 Kong, Song, Liu, Zheng, Yuan, Wang and Dai. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Kong, Xiang-Zhen
Song, Yu
Liu, Jin-Xing
Zheng, Chun-Hou
Yuan, Sha-Sha
Wang, Juan
Dai, Ling-Yun
Joint Lp-Norm and L(2,1)-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery
title Joint Lp-Norm and L(2,1)-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery
title_full Joint Lp-Norm and L(2,1)-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery
title_fullStr Joint Lp-Norm and L(2,1)-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery
title_full_unstemmed Joint Lp-Norm and L(2,1)-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery
title_short Joint Lp-Norm and L(2,1)-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery
title_sort joint lp-norm and l(2,1)-norm constrained graph laplacian pca for robust tumor sample clustering and gene network module discovery
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7940841/
https://www.ncbi.nlm.nih.gov/pubmed/33708239
http://dx.doi.org/10.3389/fgene.2021.621317
work_keys_str_mv AT kongxiangzhen jointlpnormandl21normconstrainedgraphlaplacianpcaforrobusttumorsampleclusteringandgenenetworkmodulediscovery
AT songyu jointlpnormandl21normconstrainedgraphlaplacianpcaforrobusttumorsampleclusteringandgenenetworkmodulediscovery
AT liujinxing jointlpnormandl21normconstrainedgraphlaplacianpcaforrobusttumorsampleclusteringandgenenetworkmodulediscovery
AT zhengchunhou jointlpnormandl21normconstrainedgraphlaplacianpcaforrobusttumorsampleclusteringandgenenetworkmodulediscovery
AT yuanshasha jointlpnormandl21normconstrainedgraphlaplacianpcaforrobusttumorsampleclusteringandgenenetworkmodulediscovery
AT wangjuan jointlpnormandl21normconstrainedgraphlaplacianpcaforrobusttumorsampleclusteringandgenenetworkmodulediscovery
AT dailingyun jointlpnormandl21normconstrainedgraphlaplacianpcaforrobusttumorsampleclusteringandgenenetworkmodulediscovery