Cargando…
mCOPA: analysis of heterogeneous features in cancer expression data
BACKGROUND: Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers,...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3553066/ https://www.ncbi.nlm.nih.gov/pubmed/23216803 http://dx.doi.org/10.1186/2043-9113-2-22 |
_version_ | 1782256776892121088 |
---|---|
author | Wang, Chenwei Taciroglu, Alperen Maetschke, Stefan R Nelson, Colleen C Ragan, Mark A Davis, Melissa J |
author_facet | Wang, Chenwei Taciroglu, Alperen Maetschke, Stefan R Nelson, Colleen C Ragan, Mark A Davis, Melissa J |
author_sort | Wang, Chenwei |
collection | PubMed |
description | BACKGROUND: Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. RESULTS: We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. CONCLUSIONS: We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers. |
format | Online Article Text |
id | pubmed-3553066 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35530662013-01-28 mCOPA: analysis of heterogeneous features in cancer expression data Wang, Chenwei Taciroglu, Alperen Maetschke, Stefan R Nelson, Colleen C Ragan, Mark A Davis, Melissa J J Clin Bioinforma Methodology BACKGROUND: Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. RESULTS: We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. CONCLUSIONS: We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers. BioMed Central 2012-12-10 /pmc/articles/PMC3553066/ /pubmed/23216803 http://dx.doi.org/10.1186/2043-9113-2-22 Text en Copyright ©2012 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Wang, Chenwei Taciroglu, Alperen Maetschke, Stefan R Nelson, Colleen C Ragan, Mark A Davis, Melissa J mCOPA: analysis of heterogeneous features in cancer expression data |
title | mCOPA: analysis of heterogeneous features in cancer expression data |
title_full | mCOPA: analysis of heterogeneous features in cancer expression data |
title_fullStr | mCOPA: analysis of heterogeneous features in cancer expression data |
title_full_unstemmed | mCOPA: analysis of heterogeneous features in cancer expression data |
title_short | mCOPA: analysis of heterogeneous features in cancer expression data |
title_sort | mcopa: analysis of heterogeneous features in cancer expression data |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3553066/ https://www.ncbi.nlm.nih.gov/pubmed/23216803 http://dx.doi.org/10.1186/2043-9113-2-22 |
work_keys_str_mv | AT wangchenwei mcopaanalysisofheterogeneousfeaturesincancerexpressiondata AT taciroglualperen mcopaanalysisofheterogeneousfeaturesincancerexpressiondata AT maetschkestefanr mcopaanalysisofheterogeneousfeaturesincancerexpressiondata AT nelsoncolleenc mcopaanalysisofheterogeneousfeaturesincancerexpressiondata AT raganmarka mcopaanalysisofheterogeneousfeaturesincancerexpressiondata AT davismelissaj mcopaanalysisofheterogeneousfeaturesincancerexpressiondata |