Cargando…

Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data

BACKGROUND: Advance in high-throughput technologies in genomics, transcriptomics, and metabolomics has created demand for bioinformatics tools to integrate high-dimensional data from different sources. Canonical correlation analysis (CCA) is a statistical tool for finding linear associations between...

Descripción completa

Detalles Bibliográficos
Autores principales: Yoshida, Kosuke, Yoshimoto, Junichiro, Doya, Kenji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5310015/
https://www.ncbi.nlm.nih.gov/pubmed/28196464
http://dx.doi.org/10.1186/s12859-017-1543-x
Descripción
Sumario:BACKGROUND: Advance in high-throughput technologies in genomics, transcriptomics, and metabolomics has created demand for bioinformatics tools to integrate high-dimensional data from different sources. Canonical correlation analysis (CCA) is a statistical tool for finding linear associations between different types of information. Previous extensions of CCA used to capture nonlinear associations, such as kernel CCA, did not allow feature selection or capturing of multiple canonical components. Here we propose a novel method, two-stage kernel CCA (TSKCCA) to select appropriate kernels in the framework of multiple kernel learning. RESULTS: TSKCCA first selects relevant kernels based on the HSIC criterion in the multiple kernel learning framework. Weights are then derived by non-negative matrix decomposition with L1 regularization. Using artificial datasets and nutrigenomic datasets, we show that TSKCCA can extract multiple, nonlinear associations among high-dimensional data and multiplicative interactions among variables. CONCLUSIONS: TSKCCA can identify nonlinear associations among high-dimensional data more reliably than previous nonlinear CCA methods.