Cargando…

Expression reflects population structure

Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of...

Descripción completa

Detalles Bibliográficos
Autores principales: Brown, Brielin C., Bray, Nicolas L., Pachter, Lior
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6317812/
https://www.ncbi.nlm.nih.gov/pubmed/30566439
http://dx.doi.org/10.1371/journal.pgen.1007841
_version_ 1783384787017269248
author Brown, Brielin C.
Bray, Nicolas L.
Pachter, Lior
author_facet Brown, Brielin C.
Bray, Nicolas L.
Pachter, Lior
author_sort Brown, Brielin C.
collection PubMed
description Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.
format Online
Article
Text
id pubmed-6317812
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-63178122019-01-19 Expression reflects population structure Brown, Brielin C. Bray, Nicolas L. Pachter, Lior PLoS Genet Research Article Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate. Public Library of Science 2018-12-19 /pmc/articles/PMC6317812/ /pubmed/30566439 http://dx.doi.org/10.1371/journal.pgen.1007841 Text en © 2018 Brown et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Brown, Brielin C.
Bray, Nicolas L.
Pachter, Lior
Expression reflects population structure
title Expression reflects population structure
title_full Expression reflects population structure
title_fullStr Expression reflects population structure
title_full_unstemmed Expression reflects population structure
title_short Expression reflects population structure
title_sort expression reflects population structure
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6317812/
https://www.ncbi.nlm.nih.gov/pubmed/30566439
http://dx.doi.org/10.1371/journal.pgen.1007841
work_keys_str_mv AT brownbrielinc expressionreflectspopulationstructure
AT braynicolasl expressionreflectspopulationstructure
AT pachterlior expressionreflectspopulationstructure