Cargando…
Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too com...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994707/ https://www.ncbi.nlm.nih.gov/pubmed/17907809 http://dx.doi.org/10.1371/journal.pgen.0030161 |
_version_ | 1782135493654216704 |
---|---|
author | Leek, Jeffrey T Storey, John D |
author_facet | Leek, Jeffrey T Storey, John D |
author_sort | Leek, Jeffrey T |
collection | PubMed |
description | It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies. |
format | Text |
id | pubmed-1994707 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-19947072007-09-27 Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis Leek, Jeffrey T Storey, John D PLoS Genet Research Article It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies. Public Library of Science 2007-09 2007-09-28 /pmc/articles/PMC1994707/ /pubmed/17907809 http://dx.doi.org/10.1371/journal.pgen.0030161 Text en Copyright: © 2007 Leek and Storey. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Leek, Jeffrey T Storey, John D Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis |
title | Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis |
title_full | Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis |
title_fullStr | Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis |
title_full_unstemmed | Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis |
title_short | Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis |
title_sort | capturing heterogeneity in gene expression studies by surrogate variable analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994707/ https://www.ncbi.nlm.nih.gov/pubmed/17907809 http://dx.doi.org/10.1371/journal.pgen.0030161 |
work_keys_str_mv | AT leekjeffreyt capturingheterogeneityingeneexpressionstudiesbysurrogatevariableanalysis AT storeyjohnd capturingheterogeneityingeneexpressionstudiesbysurrogatevariableanalysis |