Cargando…

Robust sparse canonical correlation analysis

BACKGROUND: Canonical correlation analysis (CCA) is a multivariate statistical method which describes the associations between two sets of variables. The objective is to find linear combinations of the variables in each data set having maximal correlation. In genomics, CCA has become increasingly im...

Descripción completa

Detalles Bibliográficos
Autores principales: Wilms, Ines, Croux, Christophe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4982144/
https://www.ncbi.nlm.nih.gov/pubmed/27516087
http://dx.doi.org/10.1186/s12918-016-0317-9
_version_ 1782447722719084544
author Wilms, Ines
Croux, Christophe
author_facet Wilms, Ines
Croux, Christophe
author_sort Wilms, Ines
collection PubMed
description BACKGROUND: Canonical correlation analysis (CCA) is a multivariate statistical method which describes the associations between two sets of variables. The objective is to find linear combinations of the variables in each data set having maximal correlation. In genomics, CCA has become increasingly important to estimate the associations between gene expression data and DNA copy number change data. The identification of such associations might help to increase our understanding of the development of diseases such as cancer. However, these data sets are typically high-dimensional, containing a lot of variables relative to the number of objects. Moreover, the data sets might contain atypical observations since it is likely that objects react differently to treatments. We discuss a method for Robust Sparse CCA, thereby providing a solution to both issues. Sparse estimation produces canonical vectors with some of their elements estimated as exactly zero. As such, their interpretability is improved. Robust methods can cope with atypical observations in the data. RESULTS: We illustrate the good performance of the Robust Sparse CCA method by several simulation studies and three biometric examples. Robust Sparse CCA considerably outperforms its main alternatives in (1) correctly detecting the main associations between the data sets, in (2) accurately estimating these associations, and in (3) detecting outliers. CONCLUSIONS: Robust Sparse CCA delivers interpretable canonical vectors, while at the same time coping with outlying observations. The proposed method is able to describe the associations between high-dimensional data sets, which are nowadays commonplace in genomics. Furthermore, the Robust Sparse CCA method allows to characterize outliers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-016-0317-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4982144
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49821442016-08-13 Robust sparse canonical correlation analysis Wilms, Ines Croux, Christophe BMC Syst Biol Methodology Article BACKGROUND: Canonical correlation analysis (CCA) is a multivariate statistical method which describes the associations between two sets of variables. The objective is to find linear combinations of the variables in each data set having maximal correlation. In genomics, CCA has become increasingly important to estimate the associations between gene expression data and DNA copy number change data. The identification of such associations might help to increase our understanding of the development of diseases such as cancer. However, these data sets are typically high-dimensional, containing a lot of variables relative to the number of objects. Moreover, the data sets might contain atypical observations since it is likely that objects react differently to treatments. We discuss a method for Robust Sparse CCA, thereby providing a solution to both issues. Sparse estimation produces canonical vectors with some of their elements estimated as exactly zero. As such, their interpretability is improved. Robust methods can cope with atypical observations in the data. RESULTS: We illustrate the good performance of the Robust Sparse CCA method by several simulation studies and three biometric examples. Robust Sparse CCA considerably outperforms its main alternatives in (1) correctly detecting the main associations between the data sets, in (2) accurately estimating these associations, and in (3) detecting outliers. CONCLUSIONS: Robust Sparse CCA delivers interpretable canonical vectors, while at the same time coping with outlying observations. The proposed method is able to describe the associations between high-dimensional data sets, which are nowadays commonplace in genomics. Furthermore, the Robust Sparse CCA method allows to characterize outliers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-016-0317-9) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-11 /pmc/articles/PMC4982144/ /pubmed/27516087 http://dx.doi.org/10.1186/s12918-016-0317-9 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Wilms, Ines
Croux, Christophe
Robust sparse canonical correlation analysis
title Robust sparse canonical correlation analysis
title_full Robust sparse canonical correlation analysis
title_fullStr Robust sparse canonical correlation analysis
title_full_unstemmed Robust sparse canonical correlation analysis
title_short Robust sparse canonical correlation analysis
title_sort robust sparse canonical correlation analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4982144/
https://www.ncbi.nlm.nih.gov/pubmed/27516087
http://dx.doi.org/10.1186/s12918-016-0317-9
work_keys_str_mv AT wilmsines robustsparsecanonicalcorrelationanalysis
AT crouxchristophe robustsparsecanonicalcorrelationanalysis