Cargando…
Robust sparse canonical correlation analysis
BACKGROUND: Canonical correlation analysis (CCA) is a multivariate statistical method which describes the associations between two sets of variables. The objective is to find linear combinations of the variables in each data set having maximal correlation. In genomics, CCA has become increasingly im...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4982144/ https://www.ncbi.nlm.nih.gov/pubmed/27516087 http://dx.doi.org/10.1186/s12918-016-0317-9 |
_version_ | 1782447722719084544 |
---|---|
author | Wilms, Ines Croux, Christophe |
author_facet | Wilms, Ines Croux, Christophe |
author_sort | Wilms, Ines |
collection | PubMed |
description | BACKGROUND: Canonical correlation analysis (CCA) is a multivariate statistical method which describes the associations between two sets of variables. The objective is to find linear combinations of the variables in each data set having maximal correlation. In genomics, CCA has become increasingly important to estimate the associations between gene expression data and DNA copy number change data. The identification of such associations might help to increase our understanding of the development of diseases such as cancer. However, these data sets are typically high-dimensional, containing a lot of variables relative to the number of objects. Moreover, the data sets might contain atypical observations since it is likely that objects react differently to treatments. We discuss a method for Robust Sparse CCA, thereby providing a solution to both issues. Sparse estimation produces canonical vectors with some of their elements estimated as exactly zero. As such, their interpretability is improved. Robust methods can cope with atypical observations in the data. RESULTS: We illustrate the good performance of the Robust Sparse CCA method by several simulation studies and three biometric examples. Robust Sparse CCA considerably outperforms its main alternatives in (1) correctly detecting the main associations between the data sets, in (2) accurately estimating these associations, and in (3) detecting outliers. CONCLUSIONS: Robust Sparse CCA delivers interpretable canonical vectors, while at the same time coping with outlying observations. The proposed method is able to describe the associations between high-dimensional data sets, which are nowadays commonplace in genomics. Furthermore, the Robust Sparse CCA method allows to characterize outliers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-016-0317-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4982144 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49821442016-08-13 Robust sparse canonical correlation analysis Wilms, Ines Croux, Christophe BMC Syst Biol Methodology Article BACKGROUND: Canonical correlation analysis (CCA) is a multivariate statistical method which describes the associations between two sets of variables. The objective is to find linear combinations of the variables in each data set having maximal correlation. In genomics, CCA has become increasingly important to estimate the associations between gene expression data and DNA copy number change data. The identification of such associations might help to increase our understanding of the development of diseases such as cancer. However, these data sets are typically high-dimensional, containing a lot of variables relative to the number of objects. Moreover, the data sets might contain atypical observations since it is likely that objects react differently to treatments. We discuss a method for Robust Sparse CCA, thereby providing a solution to both issues. Sparse estimation produces canonical vectors with some of their elements estimated as exactly zero. As such, their interpretability is improved. Robust methods can cope with atypical observations in the data. RESULTS: We illustrate the good performance of the Robust Sparse CCA method by several simulation studies and three biometric examples. Robust Sparse CCA considerably outperforms its main alternatives in (1) correctly detecting the main associations between the data sets, in (2) accurately estimating these associations, and in (3) detecting outliers. CONCLUSIONS: Robust Sparse CCA delivers interpretable canonical vectors, while at the same time coping with outlying observations. The proposed method is able to describe the associations between high-dimensional data sets, which are nowadays commonplace in genomics. Furthermore, the Robust Sparse CCA method allows to characterize outliers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-016-0317-9) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-11 /pmc/articles/PMC4982144/ /pubmed/27516087 http://dx.doi.org/10.1186/s12918-016-0317-9 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Wilms, Ines Croux, Christophe Robust sparse canonical correlation analysis |
title | Robust sparse canonical correlation analysis |
title_full | Robust sparse canonical correlation analysis |
title_fullStr | Robust sparse canonical correlation analysis |
title_full_unstemmed | Robust sparse canonical correlation analysis |
title_short | Robust sparse canonical correlation analysis |
title_sort | robust sparse canonical correlation analysis |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4982144/ https://www.ncbi.nlm.nih.gov/pubmed/27516087 http://dx.doi.org/10.1186/s12918-016-0317-9 |
work_keys_str_mv | AT wilmsines robustsparsecanonicalcorrelationanalysis AT crouxchristophe robustsparsecanonicalcorrelationanalysis |