Cargando…

mCOPA: analysis of heterogeneous features in cancer expression data

BACKGROUND: Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers,...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Chenwei, Taciroglu, Alperen, Maetschke, Stefan R, Nelson, Colleen C, Ragan, Mark A, Davis, Melissa J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3553066/
https://www.ncbi.nlm.nih.gov/pubmed/23216803
http://dx.doi.org/10.1186/2043-9113-2-22
_version_ 1782256776892121088
author Wang, Chenwei
Taciroglu, Alperen
Maetschke, Stefan R
Nelson, Colleen C
Ragan, Mark A
Davis, Melissa J
author_facet Wang, Chenwei
Taciroglu, Alperen
Maetschke, Stefan R
Nelson, Colleen C
Ragan, Mark A
Davis, Melissa J
author_sort Wang, Chenwei
collection PubMed
description BACKGROUND: Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. RESULTS: We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. CONCLUSIONS: We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers.
format Online
Article
Text
id pubmed-3553066
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35530662013-01-28 mCOPA: analysis of heterogeneous features in cancer expression data Wang, Chenwei Taciroglu, Alperen Maetschke, Stefan R Nelson, Colleen C Ragan, Mark A Davis, Melissa J J Clin Bioinforma Methodology BACKGROUND: Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. RESULTS: We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. CONCLUSIONS: We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers. BioMed Central 2012-12-10 /pmc/articles/PMC3553066/ /pubmed/23216803 http://dx.doi.org/10.1186/2043-9113-2-22 Text en Copyright ©2012 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology
Wang, Chenwei
Taciroglu, Alperen
Maetschke, Stefan R
Nelson, Colleen C
Ragan, Mark A
Davis, Melissa J
mCOPA: analysis of heterogeneous features in cancer expression data
title mCOPA: analysis of heterogeneous features in cancer expression data
title_full mCOPA: analysis of heterogeneous features in cancer expression data
title_fullStr mCOPA: analysis of heterogeneous features in cancer expression data
title_full_unstemmed mCOPA: analysis of heterogeneous features in cancer expression data
title_short mCOPA: analysis of heterogeneous features in cancer expression data
title_sort mcopa: analysis of heterogeneous features in cancer expression data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3553066/
https://www.ncbi.nlm.nih.gov/pubmed/23216803
http://dx.doi.org/10.1186/2043-9113-2-22
work_keys_str_mv AT wangchenwei mcopaanalysisofheterogeneousfeaturesincancerexpressiondata
AT taciroglualperen mcopaanalysisofheterogeneousfeaturesincancerexpressiondata
AT maetschkestefanr mcopaanalysisofheterogeneousfeaturesincancerexpressiondata
AT nelsoncolleenc mcopaanalysisofheterogeneousfeaturesincancerexpressiondata
AT raganmarka mcopaanalysisofheterogeneousfeaturesincancerexpressiondata
AT davismelissaj mcopaanalysisofheterogeneousfeaturesincancerexpressiondata