Cargando…
PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data
BACKGROUND: In recent years, identification of differentially expressed genes and sample clustering have become hot topics in bioinformatics. Principal Component Analysis (PCA) is a widely used method in gene expression data. However, it has two limitations: first, the geometric structure hidden in...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6936054/ https://www.ncbi.nlm.nih.gov/pubmed/31888433 http://dx.doi.org/10.1186/s12859-019-3229-z |
_version_ | 1783483673589317632 |
---|---|
author | Feng, Chun-Mei Xu, Yong Hou, Mi-Xiao Dai, Ling-Yun Shang, Jun-Liang |
author_facet | Feng, Chun-Mei Xu, Yong Hou, Mi-Xiao Dai, Ling-Yun Shang, Jun-Liang |
author_sort | Feng, Chun-Mei |
collection | PubMed |
description | BACKGROUND: In recent years, identification of differentially expressed genes and sample clustering have become hot topics in bioinformatics. Principal Component Analysis (PCA) is a widely used method in gene expression data. However, it has two limitations: first, the geometric structure hidden in data, e.g., pair-wise distance between data points, have not been explored. This information can facilitate sample clustering; second, the Principal Components (PCs) determined by PCA are dense, leading to hard interpretation. However, only a few of genes are related to the cancer. It is of great significance for the early diagnosis and treatment of cancer to identify a handful of the differentially expressed genes and find new cancer biomarkers. RESULTS: In this study, a new method gLSPCA is proposed to integrate both graph Laplacian and sparse constraint into PCA. gLSPCA on the one hand improves the clustering accuracy by exploring the internal geometric structure of the data, on the other hand identifies differentially expressed genes by imposing a sparsity constraint on the PCs. CONCLUSIONS: Experiments of gLSPCA and its comparison with existing methods, including Z-SPCA, GPower, PathSPCA, SPCArt, gLPCA, are performed on real datasets of both pancreatic cancer (PAAD) and head & neck squamous carcinoma (HNSC). The results demonstrate that gLSPCA is effective in identifying differentially expressed genes and sample clustering. In addition, the applications of gLSPCA on these datasets provide several new clues for the exploration of causative factors of PAAD and HNSC. |
format | Online Article Text |
id | pubmed-6936054 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69360542019-12-31 PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data Feng, Chun-Mei Xu, Yong Hou, Mi-Xiao Dai, Ling-Yun Shang, Jun-Liang BMC Bioinformatics Research BACKGROUND: In recent years, identification of differentially expressed genes and sample clustering have become hot topics in bioinformatics. Principal Component Analysis (PCA) is a widely used method in gene expression data. However, it has two limitations: first, the geometric structure hidden in data, e.g., pair-wise distance between data points, have not been explored. This information can facilitate sample clustering; second, the Principal Components (PCs) determined by PCA are dense, leading to hard interpretation. However, only a few of genes are related to the cancer. It is of great significance for the early diagnosis and treatment of cancer to identify a handful of the differentially expressed genes and find new cancer biomarkers. RESULTS: In this study, a new method gLSPCA is proposed to integrate both graph Laplacian and sparse constraint into PCA. gLSPCA on the one hand improves the clustering accuracy by exploring the internal geometric structure of the data, on the other hand identifies differentially expressed genes by imposing a sparsity constraint on the PCs. CONCLUSIONS: Experiments of gLSPCA and its comparison with existing methods, including Z-SPCA, GPower, PathSPCA, SPCArt, gLPCA, are performed on real datasets of both pancreatic cancer (PAAD) and head & neck squamous carcinoma (HNSC). The results demonstrate that gLSPCA is effective in identifying differentially expressed genes and sample clustering. In addition, the applications of gLSPCA on these datasets provide several new clues for the exploration of causative factors of PAAD and HNSC. BioMed Central 2019-12-30 /pmc/articles/PMC6936054/ /pubmed/31888433 http://dx.doi.org/10.1186/s12859-019-3229-z Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Feng, Chun-Mei Xu, Yong Hou, Mi-Xiao Dai, Ling-Yun Shang, Jun-Liang PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data |
title | PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data |
title_full | PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data |
title_fullStr | PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data |
title_full_unstemmed | PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data |
title_short | PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data |
title_sort | pca via joint graph laplacian and sparse constraint: identification of differentially expressed genes and sample clustering on gene expression data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6936054/ https://www.ncbi.nlm.nih.gov/pubmed/31888433 http://dx.doi.org/10.1186/s12859-019-3229-z |
work_keys_str_mv | AT fengchunmei pcaviajointgraphlaplacianandsparseconstraintidentificationofdifferentiallyexpressedgenesandsampleclusteringongeneexpressiondata AT xuyong pcaviajointgraphlaplacianandsparseconstraintidentificationofdifferentiallyexpressedgenesandsampleclusteringongeneexpressiondata AT houmixiao pcaviajointgraphlaplacianandsparseconstraintidentificationofdifferentiallyexpressedgenesandsampleclusteringongeneexpressiondata AT dailingyun pcaviajointgraphlaplacianandsparseconstraintidentificationofdifferentiallyexpressedgenesandsampleclusteringongeneexpressiondata AT shangjunliang pcaviajointgraphlaplacianandsparseconstraintidentificationofdifferentiallyexpressedgenesandsampleclusteringongeneexpressiondata |