Cargando…

Hierarchical Canonical Correlation Analysis Reveals Phenotype, Genotype, and Geoclimate Associations in Plants

The local environment of the geographical origin of plants shaped their genetic variations through environmental adaptation. While the characteristics of the local environment correlate with the genotypes and other genomic features of the plants, they can also be indicative of genotype-phenotype ass...

Descripción completa

Detalles Bibliográficos
Autores principales: Petegrosso, Raphael, Song, Tianci, Kuang, Rui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: AAAS 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7706319/
https://www.ncbi.nlm.nih.gov/pubmed/33313545
http://dx.doi.org/10.34133/2020/1969142
_version_ 1783617129607593984
author Petegrosso, Raphael
Song, Tianci
Kuang, Rui
author_facet Petegrosso, Raphael
Song, Tianci
Kuang, Rui
author_sort Petegrosso, Raphael
collection PubMed
description The local environment of the geographical origin of plants shaped their genetic variations through environmental adaptation. While the characteristics of the local environment correlate with the genotypes and other genomic features of the plants, they can also be indicative of genotype-phenotype associations providing additional information relevant to environmental dependence. In this study, we investigate how the geoclimatic features from the geographical origin of the Arabidopsis thaliana accessions can be integrated with genomic features for phenotype prediction and association analysis using advanced canonical correlation analysis (CCA). In particular, we propose a novel method called hierarchical canonical correlation analysis (HCCA) to combine mutations, gene expressions, and DNA methylations with geoclimatic features for informative coprojections of the features. HCCA uses a condition number of the cross-covariance between pairs of datasets to infer a hierarchical structure for applying CCA to combine the data. In the experiments on Arabidopsis thaliana data from 1001 Genomes and 1001 Epigenomes projects and climatic, atmospheric, and soil environmental variables combined by CLIMtools, HCCA provided a joint representation of the genomic data and geoclimate data for better prediction of the special flowering time at 10°C (FT10) of Arabidopsis thaliana. We also extended HCCA with information from a protein-protein interaction (PPI) network to guide the feature learning by imposing network modules onto the genomic features, which are shown to be useful for identifying genes with more coherent functions correlated with the geoclimatic features. The findings in this study suggest that environmental data comprise an important component in plant phenotype analysis. HCCA is a useful data integration technique for phenotype prediction, and a better understanding of the interactions between gene functions and environment as more useful functional information is introduced by coprojections of multiple genomic datasets.
format Online
Article
Text
id pubmed-7706319
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher AAAS
record_format MEDLINE/PubMed
spelling pubmed-77063192020-12-10 Hierarchical Canonical Correlation Analysis Reveals Phenotype, Genotype, and Geoclimate Associations in Plants Petegrosso, Raphael Song, Tianci Kuang, Rui Plant Phenomics Research Article The local environment of the geographical origin of plants shaped their genetic variations through environmental adaptation. While the characteristics of the local environment correlate with the genotypes and other genomic features of the plants, they can also be indicative of genotype-phenotype associations providing additional information relevant to environmental dependence. In this study, we investigate how the geoclimatic features from the geographical origin of the Arabidopsis thaliana accessions can be integrated with genomic features for phenotype prediction and association analysis using advanced canonical correlation analysis (CCA). In particular, we propose a novel method called hierarchical canonical correlation analysis (HCCA) to combine mutations, gene expressions, and DNA methylations with geoclimatic features for informative coprojections of the features. HCCA uses a condition number of the cross-covariance between pairs of datasets to infer a hierarchical structure for applying CCA to combine the data. In the experiments on Arabidopsis thaliana data from 1001 Genomes and 1001 Epigenomes projects and climatic, atmospheric, and soil environmental variables combined by CLIMtools, HCCA provided a joint representation of the genomic data and geoclimate data for better prediction of the special flowering time at 10°C (FT10) of Arabidopsis thaliana. We also extended HCCA with information from a protein-protein interaction (PPI) network to guide the feature learning by imposing network modules onto the genomic features, which are shown to be useful for identifying genes with more coherent functions correlated with the geoclimatic features. The findings in this study suggest that environmental data comprise an important component in plant phenotype analysis. HCCA is a useful data integration technique for phenotype prediction, and a better understanding of the interactions between gene functions and environment as more useful functional information is introduced by coprojections of multiple genomic datasets. AAAS 2020-03-31 /pmc/articles/PMC7706319/ /pubmed/33313545 http://dx.doi.org/10.34133/2020/1969142 Text en Copyright © 2020 Raphael Petegrosso et al. http://creativecommons.org/licenses/by/4.0/ Exclusive Licensee Nanjing Agricultural University. Distributed under a Creative Commons Attribution License (CC BY 4.0).
spellingShingle Research Article
Petegrosso, Raphael
Song, Tianci
Kuang, Rui
Hierarchical Canonical Correlation Analysis Reveals Phenotype, Genotype, and Geoclimate Associations in Plants
title Hierarchical Canonical Correlation Analysis Reveals Phenotype, Genotype, and Geoclimate Associations in Plants
title_full Hierarchical Canonical Correlation Analysis Reveals Phenotype, Genotype, and Geoclimate Associations in Plants
title_fullStr Hierarchical Canonical Correlation Analysis Reveals Phenotype, Genotype, and Geoclimate Associations in Plants
title_full_unstemmed Hierarchical Canonical Correlation Analysis Reveals Phenotype, Genotype, and Geoclimate Associations in Plants
title_short Hierarchical Canonical Correlation Analysis Reveals Phenotype, Genotype, and Geoclimate Associations in Plants
title_sort hierarchical canonical correlation analysis reveals phenotype, genotype, and geoclimate associations in plants
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7706319/
https://www.ncbi.nlm.nih.gov/pubmed/33313545
http://dx.doi.org/10.34133/2020/1969142
work_keys_str_mv AT petegrossoraphael hierarchicalcanonicalcorrelationanalysisrevealsphenotypegenotypeandgeoclimateassociationsinplants
AT songtianci hierarchicalcanonicalcorrelationanalysisrevealsphenotypegenotypeandgeoclimateassociationsinplants
AT kuangrui hierarchicalcanonicalcorrelationanalysisrevealsphenotypegenotypeandgeoclimateassociationsinplants