Cargando…
Imputing gene expression to maximize platform compatibility
Microarray measurements of gene expression constitute a large fraction of publicly shared biological data, and are available in the Gene Expression Omnibus (GEO). Many studies use GEO data to shape hypotheses and improve statistical power. Within GEO, the Affymetrix HG-U133A and HG-U133 Plus 2.0 are...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408923/ https://www.ncbi.nlm.nih.gov/pubmed/27797771 http://dx.doi.org/10.1093/bioinformatics/btw664 |
_version_ | 1783232385067778048 |
---|---|
author | Zhou, Weizhuang Han, Lichy Altman, Russ B |
author_facet | Zhou, Weizhuang Han, Lichy Altman, Russ B |
author_sort | Zhou, Weizhuang |
collection | PubMed |
description | Microarray measurements of gene expression constitute a large fraction of publicly shared biological data, and are available in the Gene Expression Omnibus (GEO). Many studies use GEO data to shape hypotheses and improve statistical power. Within GEO, the Affymetrix HG-U133A and HG-U133 Plus 2.0 are the two most commonly used microarray platforms for human samples; the HG-U133 Plus 2.0 platform contains 54 220 probes and the HG-U133A array contains a proper subset (21 722 probes). When different platforms are involved, the subset of common genes is most easily compared. This approach results in the exclusion of substantial measured data and can limit downstream analysis. To predict the expression values for the genes unique to the HG-U133 Plus 2.0 platform, we constructed a series of gene expression inference models based on genes common to both platforms. Our model predicts gene expression values that are within the variability observed in controlled replicate studies and are highly correlated with measured data. Using six previously published studies, we also demonstrate the improved performance of the enlarged feature space generated by our model in downstream analysis. AVAILABILITY AND IMPLEMENTATION: The gene inference model described in this paper is available as a R package (affyImpute), which can be downloaded at http://simtk.org/home/affyimpute. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-5408923 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-54089232017-05-03 Imputing gene expression to maximize platform compatibility Zhou, Weizhuang Han, Lichy Altman, Russ B Bioinformatics Original Papers Microarray measurements of gene expression constitute a large fraction of publicly shared biological data, and are available in the Gene Expression Omnibus (GEO). Many studies use GEO data to shape hypotheses and improve statistical power. Within GEO, the Affymetrix HG-U133A and HG-U133 Plus 2.0 are the two most commonly used microarray platforms for human samples; the HG-U133 Plus 2.0 platform contains 54 220 probes and the HG-U133A array contains a proper subset (21 722 probes). When different platforms are involved, the subset of common genes is most easily compared. This approach results in the exclusion of substantial measured data and can limit downstream analysis. To predict the expression values for the genes unique to the HG-U133 Plus 2.0 platform, we constructed a series of gene expression inference models based on genes common to both platforms. Our model predicts gene expression values that are within the variability observed in controlled replicate studies and are highly correlated with measured data. Using six previously published studies, we also demonstrate the improved performance of the enlarged feature space generated by our model in downstream analysis. AVAILABILITY AND IMPLEMENTATION: The gene inference model described in this paper is available as a R package (affyImpute), which can be downloaded at http://simtk.org/home/affyimpute. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-02-15 2016-11-21 /pmc/articles/PMC5408923/ /pubmed/27797771 http://dx.doi.org/10.1093/bioinformatics/btw664 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Zhou, Weizhuang Han, Lichy Altman, Russ B Imputing gene expression to maximize platform compatibility |
title | Imputing gene expression to maximize platform compatibility |
title_full | Imputing gene expression to maximize platform compatibility |
title_fullStr | Imputing gene expression to maximize platform compatibility |
title_full_unstemmed | Imputing gene expression to maximize platform compatibility |
title_short | Imputing gene expression to maximize platform compatibility |
title_sort | imputing gene expression to maximize platform compatibility |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408923/ https://www.ncbi.nlm.nih.gov/pubmed/27797771 http://dx.doi.org/10.1093/bioinformatics/btw664 |
work_keys_str_mv | AT zhouweizhuang imputinggeneexpressiontomaximizeplatformcompatibility AT hanlichy imputinggeneexpressiontomaximizeplatformcompatibility AT altmanrussb imputinggeneexpressiontomaximizeplatformcompatibility |