Cargando…

Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis

It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too com...

Descripción completa

Detalles Bibliográficos
Autores principales: Leek, Jeffrey T, Storey, John D
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994707/
https://www.ncbi.nlm.nih.gov/pubmed/17907809
http://dx.doi.org/10.1371/journal.pgen.0030161
_version_ 1782135493654216704
author Leek, Jeffrey T
Storey, John D
author_facet Leek, Jeffrey T
Storey, John D
author_sort Leek, Jeffrey T
collection PubMed
description It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.
format Text
id pubmed-1994707
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-19947072007-09-27 Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis Leek, Jeffrey T Storey, John D PLoS Genet Research Article It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies. Public Library of Science 2007-09 2007-09-28 /pmc/articles/PMC1994707/ /pubmed/17907809 http://dx.doi.org/10.1371/journal.pgen.0030161 Text en Copyright: © 2007 Leek and Storey. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Leek, Jeffrey T
Storey, John D
Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
title Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
title_full Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
title_fullStr Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
title_full_unstemmed Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
title_short Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
title_sort capturing heterogeneity in gene expression studies by surrogate variable analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994707/
https://www.ncbi.nlm.nih.gov/pubmed/17907809
http://dx.doi.org/10.1371/journal.pgen.0030161
work_keys_str_mv AT leekjeffreyt capturingheterogeneityingeneexpressionstudiesbysurrogatevariableanalysis
AT storeyjohnd capturingheterogeneityingeneexpressionstudiesbysurrogatevariableanalysis