Cargando…

Canonical correlation analysis for multi-omics: Application to cross-cohort analysis

Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method desi...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Min-Zhi, Aguet, François, Ardlie, Kristin, Chen, Jiawen, Cornell, Elaine, Cruz, Dan, Durda, Peter, Gabriel, Stacey B., Gerszten, Robert E., Guo, Xiuqing, Johnson, Craig W., Kasela, Silva, Lange, Leslie A., Lappalainen, Tuuli, Liu, Yongmei, Reiner, Alex P., Smith, Josh, Sofer, Tamar, Taylor, Kent D., Tracy, Russell P., VanDenBerg, David J., Wilson, James G., Rich, Stephen S., Rotter, Jerome I., Love, Michael I., Raffield, Laura M., Li, Yun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10237647/
https://www.ncbi.nlm.nih.gov/pubmed/37216410
http://dx.doi.org/10.1371/journal.pgen.1010517
_version_ 1785053188668260352
author Jiang, Min-Zhi
Aguet, François
Ardlie, Kristin
Chen, Jiawen
Cornell, Elaine
Cruz, Dan
Durda, Peter
Gabriel, Stacey B.
Gerszten, Robert E.
Guo, Xiuqing
Johnson, Craig W.
Kasela, Silva
Lange, Leslie A.
Lappalainen, Tuuli
Liu, Yongmei
Reiner, Alex P.
Smith, Josh
Sofer, Tamar
Taylor, Kent D.
Tracy, Russell P.
VanDenBerg, David J.
Wilson, James G.
Rich, Stephen S.
Rotter, Jerome I.
Love, Michael I.
Raffield, Laura M.
Li, Yun
author_facet Jiang, Min-Zhi
Aguet, François
Ardlie, Kristin
Chen, Jiawen
Cornell, Elaine
Cruz, Dan
Durda, Peter
Gabriel, Stacey B.
Gerszten, Robert E.
Guo, Xiuqing
Johnson, Craig W.
Kasela, Silva
Lange, Leslie A.
Lappalainen, Tuuli
Liu, Yongmei
Reiner, Alex P.
Smith, Josh
Sofer, Tamar
Taylor, Kent D.
Tracy, Russell P.
VanDenBerg, David J.
Wilson, James G.
Rich, Stephen S.
Rotter, Jerome I.
Love, Michael I.
Raffield, Laura M.
Li, Yun
author_sort Jiang, Min-Zhi
collection PubMed
description Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features–referred to as canonical variables (CVs)–within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits.
format Online
Article
Text
id pubmed-10237647
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-102376472023-06-03 Canonical correlation analysis for multi-omics: Application to cross-cohort analysis Jiang, Min-Zhi Aguet, François Ardlie, Kristin Chen, Jiawen Cornell, Elaine Cruz, Dan Durda, Peter Gabriel, Stacey B. Gerszten, Robert E. Guo, Xiuqing Johnson, Craig W. Kasela, Silva Lange, Leslie A. Lappalainen, Tuuli Liu, Yongmei Reiner, Alex P. Smith, Josh Sofer, Tamar Taylor, Kent D. Tracy, Russell P. VanDenBerg, David J. Wilson, James G. Rich, Stephen S. Rotter, Jerome I. Love, Michael I. Raffield, Laura M. Li, Yun PLoS Genet Research Article Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features–referred to as canonical variables (CVs)–within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits. Public Library of Science 2023-05-22 /pmc/articles/PMC10237647/ /pubmed/37216410 http://dx.doi.org/10.1371/journal.pgen.1010517 Text en © 2023 Jiang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Jiang, Min-Zhi
Aguet, François
Ardlie, Kristin
Chen, Jiawen
Cornell, Elaine
Cruz, Dan
Durda, Peter
Gabriel, Stacey B.
Gerszten, Robert E.
Guo, Xiuqing
Johnson, Craig W.
Kasela, Silva
Lange, Leslie A.
Lappalainen, Tuuli
Liu, Yongmei
Reiner, Alex P.
Smith, Josh
Sofer, Tamar
Taylor, Kent D.
Tracy, Russell P.
VanDenBerg, David J.
Wilson, James G.
Rich, Stephen S.
Rotter, Jerome I.
Love, Michael I.
Raffield, Laura M.
Li, Yun
Canonical correlation analysis for multi-omics: Application to cross-cohort analysis
title Canonical correlation analysis for multi-omics: Application to cross-cohort analysis
title_full Canonical correlation analysis for multi-omics: Application to cross-cohort analysis
title_fullStr Canonical correlation analysis for multi-omics: Application to cross-cohort analysis
title_full_unstemmed Canonical correlation analysis for multi-omics: Application to cross-cohort analysis
title_short Canonical correlation analysis for multi-omics: Application to cross-cohort analysis
title_sort canonical correlation analysis for multi-omics: application to cross-cohort analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10237647/
https://www.ncbi.nlm.nih.gov/pubmed/37216410
http://dx.doi.org/10.1371/journal.pgen.1010517
work_keys_str_mv AT jiangminzhi canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT aguetfrancois canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT ardliekristin canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT chenjiawen canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT cornellelaine canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT cruzdan canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT durdapeter canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT gabrielstaceyb canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT gersztenroberte canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT guoxiuqing canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT johnsoncraigw canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT kaselasilva canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT langelesliea canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT lappalainentuuli canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT liuyongmei canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT reineralexp canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT smithjosh canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT sofertamar canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT taylorkentd canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT tracyrussellp canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT vandenbergdavidj canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT wilsonjamesg canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT richstephens canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT rotterjeromei canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT lovemichaeli canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT raffieldlauram canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT liyun canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis
AT canonicalcorrelationanalysisformultiomicsapplicationtocrosscohortanalysis