Cargando…

Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study

MOTIVATION: Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMI...

Descripción completa

Detalles Bibliográficos
Autores principales: Rodosthenous, Theodoulos, Shahrezaei, Vahid, Evangelou, Marina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7750936/
https://www.ncbi.nlm.nih.gov/pubmed/32437529
http://dx.doi.org/10.1093/bioinformatics/btaa530
_version_ 1783625575353548800
author Rodosthenous, Theodoulos
Shahrezaei, Vahid
Evangelou, Marina
author_facet Rodosthenous, Theodoulos
Shahrezaei, Vahid
Evangelou, Marina
author_sort Rodosthenous, Theodoulos
collection PubMed
description MOTIVATION: Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional ([Formula: see text]) data, such as OMICS. The sparse variant of canonical correlation analysis (CCA) approach is a promising one that seeks to penalize the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets. RESULTS: Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al., penalized matrix decomposition CCA proposed by Witten and Tibshirani and its extension proposed by Suo et al. The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/theorod93/sCCA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7750936
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77509362020-12-28 Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study Rodosthenous, Theodoulos Shahrezaei, Vahid Evangelou, Marina Bioinformatics Original Papers MOTIVATION: Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional ([Formula: see text]) data, such as OMICS. The sparse variant of canonical correlation analysis (CCA) approach is a promising one that seeks to penalize the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets. RESULTS: Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al., penalized matrix decomposition CCA proposed by Witten and Tibshirani and its extension proposed by Suo et al. The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/theorod93/sCCA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-05-21 /pmc/articles/PMC7750936/ /pubmed/32437529 http://dx.doi.org/10.1093/bioinformatics/btaa530 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Rodosthenous, Theodoulos
Shahrezaei, Vahid
Evangelou, Marina
Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study
title Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study
title_full Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study
title_fullStr Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study
title_full_unstemmed Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study
title_short Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study
title_sort integrating multi-omics data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7750936/
https://www.ncbi.nlm.nih.gov/pubmed/32437529
http://dx.doi.org/10.1093/bioinformatics/btaa530
work_keys_str_mv AT rodosthenoustheodoulos integratingmultiomicsdatathroughsparsecanonicalcorrelationanalysisforthepredictionofcomplextraitsacomparisonstudy
AT shahrezaeivahid integratingmultiomicsdatathroughsparsecanonicalcorrelationanalysisforthepredictionofcomplextraitsacomparisonstudy
AT evangeloumarina integratingmultiomicsdatathroughsparsecanonicalcorrelationanalysisforthepredictionofcomplextraitsacomparisonstudy