Cargando…

SPARCoC: A New Framework for Molecular Pattern Discovery and Cancer Gene Identification

It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heteroge...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Shiqian, Johnson, Daniel, Ashby, Cody, Xiong, Donghai, Cramer, Carole L., Moore, Jason H., Zhang, Shuzhong, Huang, Xiuzhen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4359112/
https://www.ncbi.nlm.nih.gov/pubmed/25768286
http://dx.doi.org/10.1371/journal.pone.0117135
_version_ 1782361341114187776
author Ma, Shiqian
Johnson, Daniel
Ashby, Cody
Xiong, Donghai
Cramer, Carole L.
Moore, Jason H.
Zhang, Shuzhong
Huang, Xiuzhen
author_facet Ma, Shiqian
Johnson, Daniel
Ashby, Cody
Xiong, Donghai
Cramer, Carole L.
Moore, Jason H.
Zhang, Shuzhong
Huang, Xiuzhen
author_sort Ma, Shiqian
collection PubMed
description It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust), which is based on a novel Common-background and Sparse-foreground Decomposition (CSD) model and the Maximum Block Improvement (MBI) co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF). We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05). Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification.
format Online
Article
Text
id pubmed-4359112
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-43591122015-03-23 SPARCoC: A New Framework for Molecular Pattern Discovery and Cancer Gene Identification Ma, Shiqian Johnson, Daniel Ashby, Cody Xiong, Donghai Cramer, Carole L. Moore, Jason H. Zhang, Shuzhong Huang, Xiuzhen PLoS One Research Article It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust), which is based on a novel Common-background and Sparse-foreground Decomposition (CSD) model and the Maximum Block Improvement (MBI) co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF). We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05). Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification. Public Library of Science 2015-03-13 /pmc/articles/PMC4359112/ /pubmed/25768286 http://dx.doi.org/10.1371/journal.pone.0117135 Text en © 2015 Ma et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ma, Shiqian
Johnson, Daniel
Ashby, Cody
Xiong, Donghai
Cramer, Carole L.
Moore, Jason H.
Zhang, Shuzhong
Huang, Xiuzhen
SPARCoC: A New Framework for Molecular Pattern Discovery and Cancer Gene Identification
title SPARCoC: A New Framework for Molecular Pattern Discovery and Cancer Gene Identification
title_full SPARCoC: A New Framework for Molecular Pattern Discovery and Cancer Gene Identification
title_fullStr SPARCoC: A New Framework for Molecular Pattern Discovery and Cancer Gene Identification
title_full_unstemmed SPARCoC: A New Framework for Molecular Pattern Discovery and Cancer Gene Identification
title_short SPARCoC: A New Framework for Molecular Pattern Discovery and Cancer Gene Identification
title_sort sparcoc: a new framework for molecular pattern discovery and cancer gene identification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4359112/
https://www.ncbi.nlm.nih.gov/pubmed/25768286
http://dx.doi.org/10.1371/journal.pone.0117135
work_keys_str_mv AT mashiqian sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT johnsondaniel sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT ashbycody sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT xiongdonghai sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT cramercarolel sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT moorejasonh sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT zhangshuzhong sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification
AT huangxiuzhen sparcocanewframeworkformolecularpatterndiscoveryandcancergeneidentification