Cargando…

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed

When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. Whe...

Descripción completa

Detalles Bibliográficos
Autores principales: Jacob, Laurent, Gagnon-Bartsch, Johann A., Speed, Terence P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4679071/
https://www.ncbi.nlm.nih.gov/pubmed/26286812
http://dx.doi.org/10.1093/biostatistics/kxv026
_version_ 1782405541195153408
author Jacob, Laurent
Gagnon-Bartsch, Johann A.
Speed, Terence P.
author_facet Jacob, Laurent
Gagnon-Bartsch, Johann A.
Speed, Terence P.
author_sort Jacob, Laurent
collection PubMed
description When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset—as opposed to the study of an observed factor of interest—taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package RUVnormalize.
format Online
Article
Text
id pubmed-4679071
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46790712015-12-16 Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed Jacob, Laurent Gagnon-Bartsch, Johann A. Speed, Terence P. Biostatistics Articles When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset—as opposed to the study of an observed factor of interest—taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package RUVnormalize. Oxford University Press 2016-01 2015-08-17 /pmc/articles/PMC4679071/ /pubmed/26286812 http://dx.doi.org/10.1093/biostatistics/kxv026 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Articles
Jacob, Laurent
Gagnon-Bartsch, Johann A.
Speed, Terence P.
Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed
title Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed
title_full Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed
title_fullStr Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed
title_full_unstemmed Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed
title_short Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed
title_sort correcting gene expression data when neither the unwanted variation nor the factor of interest are observed
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4679071/
https://www.ncbi.nlm.nih.gov/pubmed/26286812
http://dx.doi.org/10.1093/biostatistics/kxv026
work_keys_str_mv AT jacoblaurent correctinggeneexpressiondatawhenneithertheunwantedvariationnorthefactorofinterestareobserved
AT gagnonbartschjohanna correctinggeneexpressiondatawhenneithertheunwantedvariationnorthefactorofinterestareobserved
AT speedterencep correctinggeneexpressiondatawhenneithertheunwantedvariationnorthefactorofinterestareobserved