Cargando…

Genome-wide sparse canonical correlation of gene expression with genotypes

There is a growing interest in studying natural variation in human gene expression. Studies mapping genetic determinants of expression profiles are often carried out considering the expression of one gene at a time, an approach that is computationally intensive and may be prone to high false-discove...

Descripción completa

Detalles Bibliográficos
Autores principales: Parkhomenko, Elena, Tritchler, David, Beyene, Joseph
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367499/
https://www.ncbi.nlm.nih.gov/pubmed/18466460
_version_ 1782154306434105344
author Parkhomenko, Elena
Tritchler, David
Beyene, Joseph
author_facet Parkhomenko, Elena
Tritchler, David
Beyene, Joseph
author_sort Parkhomenko, Elena
collection PubMed
description There is a growing interest in studying natural variation in human gene expression. Studies mapping genetic determinants of expression profiles are often carried out considering the expression of one gene at a time, an approach that is computationally intensive and may be prone to high false-discovery rate because the number of genes under consideration often exceeds tens of thousands. We present an exploratory method for investigating such data and apply it to the data provided as Problem 1 of Genetic Analysis Workshop 15 (GAW15). In multivariate analysis, canonical correlation analysis is a common way to inspect the relationship between two sets of variables based on their correlation. It determines linear combinations of all variables from each data set such that the correlation between the two linear combinations is maximized. However, due to the large number of genes, linear combinations involving all single-nucleotide polymorphism (SNP) loci and gene expression phenotypes lack biological plausibility and interpretability. We introduce sparse canonical correlation analysis, which examines the relationships of many genetic loci and gene expression phenotypes by providing sparse linear combinations that include only a small subset of loci and gene expression phenotypes. These correlated sets of variables are sufficiently small for biological interpretability and further investigation. Applying this method to the GAW15 Problem 1 data, we identified groups of 41 loci and 150 gene expressions with the highest between-group correlation of 43%.
format Text
id pubmed-2367499
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23674992008-05-06 Genome-wide sparse canonical correlation of gene expression with genotypes Parkhomenko, Elena Tritchler, David Beyene, Joseph BMC Proc Proceedings There is a growing interest in studying natural variation in human gene expression. Studies mapping genetic determinants of expression profiles are often carried out considering the expression of one gene at a time, an approach that is computationally intensive and may be prone to high false-discovery rate because the number of genes under consideration often exceeds tens of thousands. We present an exploratory method for investigating such data and apply it to the data provided as Problem 1 of Genetic Analysis Workshop 15 (GAW15). In multivariate analysis, canonical correlation analysis is a common way to inspect the relationship between two sets of variables based on their correlation. It determines linear combinations of all variables from each data set such that the correlation between the two linear combinations is maximized. However, due to the large number of genes, linear combinations involving all single-nucleotide polymorphism (SNP) loci and gene expression phenotypes lack biological plausibility and interpretability. We introduce sparse canonical correlation analysis, which examines the relationships of many genetic loci and gene expression phenotypes by providing sparse linear combinations that include only a small subset of loci and gene expression phenotypes. These correlated sets of variables are sufficiently small for biological interpretability and further investigation. Applying this method to the GAW15 Problem 1 data, we identified groups of 41 loci and 150 gene expressions with the highest between-group correlation of 43%. BioMed Central 2007-12-18 /pmc/articles/PMC2367499/ /pubmed/18466460 Text en Copyright © 2007 Parkhomenko et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Parkhomenko, Elena
Tritchler, David
Beyene, Joseph
Genome-wide sparse canonical correlation of gene expression with genotypes
title Genome-wide sparse canonical correlation of gene expression with genotypes
title_full Genome-wide sparse canonical correlation of gene expression with genotypes
title_fullStr Genome-wide sparse canonical correlation of gene expression with genotypes
title_full_unstemmed Genome-wide sparse canonical correlation of gene expression with genotypes
title_short Genome-wide sparse canonical correlation of gene expression with genotypes
title_sort genome-wide sparse canonical correlation of gene expression with genotypes
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367499/
https://www.ncbi.nlm.nih.gov/pubmed/18466460
work_keys_str_mv AT parkhomenkoelena genomewidesparsecanonicalcorrelationofgeneexpressionwithgenotypes
AT tritchlerdavid genomewidesparsecanonicalcorrelationofgeneexpressionwithgenotypes
AT beyenejoseph genomewidesparsecanonicalcorrelationofgeneexpressionwithgenotypes