Cargando…
Genome-wide sparse canonical correlation of gene expression with genotypes
There is a growing interest in studying natural variation in human gene expression. Studies mapping genetic determinants of expression profiles are often carried out considering the expression of one gene at a time, an approach that is computationally intensive and may be prone to high false-discove...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367499/ https://www.ncbi.nlm.nih.gov/pubmed/18466460 |
_version_ | 1782154306434105344 |
---|---|
author | Parkhomenko, Elena Tritchler, David Beyene, Joseph |
author_facet | Parkhomenko, Elena Tritchler, David Beyene, Joseph |
author_sort | Parkhomenko, Elena |
collection | PubMed |
description | There is a growing interest in studying natural variation in human gene expression. Studies mapping genetic determinants of expression profiles are often carried out considering the expression of one gene at a time, an approach that is computationally intensive and may be prone to high false-discovery rate because the number of genes under consideration often exceeds tens of thousands. We present an exploratory method for investigating such data and apply it to the data provided as Problem 1 of Genetic Analysis Workshop 15 (GAW15). In multivariate analysis, canonical correlation analysis is a common way to inspect the relationship between two sets of variables based on their correlation. It determines linear combinations of all variables from each data set such that the correlation between the two linear combinations is maximized. However, due to the large number of genes, linear combinations involving all single-nucleotide polymorphism (SNP) loci and gene expression phenotypes lack biological plausibility and interpretability. We introduce sparse canonical correlation analysis, which examines the relationships of many genetic loci and gene expression phenotypes by providing sparse linear combinations that include only a small subset of loci and gene expression phenotypes. These correlated sets of variables are sufficiently small for biological interpretability and further investigation. Applying this method to the GAW15 Problem 1 data, we identified groups of 41 loci and 150 gene expressions with the highest between-group correlation of 43%. |
format | Text |
id | pubmed-2367499 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23674992008-05-06 Genome-wide sparse canonical correlation of gene expression with genotypes Parkhomenko, Elena Tritchler, David Beyene, Joseph BMC Proc Proceedings There is a growing interest in studying natural variation in human gene expression. Studies mapping genetic determinants of expression profiles are often carried out considering the expression of one gene at a time, an approach that is computationally intensive and may be prone to high false-discovery rate because the number of genes under consideration often exceeds tens of thousands. We present an exploratory method for investigating such data and apply it to the data provided as Problem 1 of Genetic Analysis Workshop 15 (GAW15). In multivariate analysis, canonical correlation analysis is a common way to inspect the relationship between two sets of variables based on their correlation. It determines linear combinations of all variables from each data set such that the correlation between the two linear combinations is maximized. However, due to the large number of genes, linear combinations involving all single-nucleotide polymorphism (SNP) loci and gene expression phenotypes lack biological plausibility and interpretability. We introduce sparse canonical correlation analysis, which examines the relationships of many genetic loci and gene expression phenotypes by providing sparse linear combinations that include only a small subset of loci and gene expression phenotypes. These correlated sets of variables are sufficiently small for biological interpretability and further investigation. Applying this method to the GAW15 Problem 1 data, we identified groups of 41 loci and 150 gene expressions with the highest between-group correlation of 43%. BioMed Central 2007-12-18 /pmc/articles/PMC2367499/ /pubmed/18466460 Text en Copyright © 2007 Parkhomenko et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Parkhomenko, Elena Tritchler, David Beyene, Joseph Genome-wide sparse canonical correlation of gene expression with genotypes |
title | Genome-wide sparse canonical correlation of gene expression with genotypes |
title_full | Genome-wide sparse canonical correlation of gene expression with genotypes |
title_fullStr | Genome-wide sparse canonical correlation of gene expression with genotypes |
title_full_unstemmed | Genome-wide sparse canonical correlation of gene expression with genotypes |
title_short | Genome-wide sparse canonical correlation of gene expression with genotypes |
title_sort | genome-wide sparse canonical correlation of gene expression with genotypes |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367499/ https://www.ncbi.nlm.nih.gov/pubmed/18466460 |
work_keys_str_mv | AT parkhomenkoelena genomewidesparsecanonicalcorrelationofgeneexpressionwithgenotypes AT tritchlerdavid genomewidesparsecanonicalcorrelationofgeneexpressionwithgenotypes AT beyenejoseph genomewidesparsecanonicalcorrelationofgeneexpressionwithgenotypes |