Cargando…

Imputing gene expression to maximize platform compatibility

Microarray measurements of gene expression constitute a large fraction of publicly shared biological data, and are available in the Gene Expression Omnibus (GEO). Many studies use GEO data to shape hypotheses and improve statistical power. Within GEO, the Affymetrix HG-U133A and HG-U133 Plus 2.0 are...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Weizhuang, Han, Lichy, Altman, Russ B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408923/
https://www.ncbi.nlm.nih.gov/pubmed/27797771
http://dx.doi.org/10.1093/bioinformatics/btw664
_version_ 1783232385067778048
author Zhou, Weizhuang
Han, Lichy
Altman, Russ B
author_facet Zhou, Weizhuang
Han, Lichy
Altman, Russ B
author_sort Zhou, Weizhuang
collection PubMed
description Microarray measurements of gene expression constitute a large fraction of publicly shared biological data, and are available in the Gene Expression Omnibus (GEO). Many studies use GEO data to shape hypotheses and improve statistical power. Within GEO, the Affymetrix HG-U133A and HG-U133 Plus 2.0 are the two most commonly used microarray platforms for human samples; the HG-U133 Plus 2.0 platform contains 54 220 probes and the HG-U133A array contains a proper subset (21 722 probes). When different platforms are involved, the subset of common genes is most easily compared. This approach results in the exclusion of substantial measured data and can limit downstream analysis. To predict the expression values for the genes unique to the HG-U133 Plus 2.0 platform, we constructed a series of gene expression inference models based on genes common to both platforms. Our model predicts gene expression values that are within the variability observed in controlled replicate studies and are highly correlated with measured data. Using six previously published studies, we also demonstrate the improved performance of the enlarged feature space generated by our model in downstream analysis. AVAILABILITY AND IMPLEMENTATION: The gene inference model described in this paper is available as a R package (affyImpute), which can be downloaded at http://simtk.org/home/affyimpute. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5408923
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54089232017-05-03 Imputing gene expression to maximize platform compatibility Zhou, Weizhuang Han, Lichy Altman, Russ B Bioinformatics Original Papers Microarray measurements of gene expression constitute a large fraction of publicly shared biological data, and are available in the Gene Expression Omnibus (GEO). Many studies use GEO data to shape hypotheses and improve statistical power. Within GEO, the Affymetrix HG-U133A and HG-U133 Plus 2.0 are the two most commonly used microarray platforms for human samples; the HG-U133 Plus 2.0 platform contains 54 220 probes and the HG-U133A array contains a proper subset (21 722 probes). When different platforms are involved, the subset of common genes is most easily compared. This approach results in the exclusion of substantial measured data and can limit downstream analysis. To predict the expression values for the genes unique to the HG-U133 Plus 2.0 platform, we constructed a series of gene expression inference models based on genes common to both platforms. Our model predicts gene expression values that are within the variability observed in controlled replicate studies and are highly correlated with measured data. Using six previously published studies, we also demonstrate the improved performance of the enlarged feature space generated by our model in downstream analysis. AVAILABILITY AND IMPLEMENTATION: The gene inference model described in this paper is available as a R package (affyImpute), which can be downloaded at http://simtk.org/home/affyimpute. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-02-15 2016-11-21 /pmc/articles/PMC5408923/ /pubmed/27797771 http://dx.doi.org/10.1093/bioinformatics/btw664 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Zhou, Weizhuang
Han, Lichy
Altman, Russ B
Imputing gene expression to maximize platform compatibility
title Imputing gene expression to maximize platform compatibility
title_full Imputing gene expression to maximize platform compatibility
title_fullStr Imputing gene expression to maximize platform compatibility
title_full_unstemmed Imputing gene expression to maximize platform compatibility
title_short Imputing gene expression to maximize platform compatibility
title_sort imputing gene expression to maximize platform compatibility
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408923/
https://www.ncbi.nlm.nih.gov/pubmed/27797771
http://dx.doi.org/10.1093/bioinformatics/btw664
work_keys_str_mv AT zhouweizhuang imputinggeneexpressiontomaximizeplatformcompatibility
AT hanlichy imputinggeneexpressiontomaximizeplatformcompatibility
AT altmanrussb imputinggeneexpressiontomaximizeplatformcompatibility