Cargando…

Normalized Affymetrix expression data are biased by G-quadruplex formation

Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogst...

Descripción completa

Detalles Bibliográficos
Autores principales: Shanahan, Hugh P., Memon, Farhat N., Upton, Graham J. G., Harrison, Andrew P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3333884/
https://www.ncbi.nlm.nih.gov/pubmed/22199258
http://dx.doi.org/10.1093/nar/gkr1230
_version_ 1782230540831686656
author Shanahan, Hugh P.
Memon, Farhat N.
Upton, Graham J. G.
Harrison, Andrew P.
author_facet Shanahan, Hugh P.
Memon, Farhat N.
Upton, Graham J. G.
Harrison, Andrew P.
author_sort Shanahan, Hugh P.
collection PubMed
description Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.
format Online
Article
Text
id pubmed-3333884
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-33338842012-04-23 Normalized Affymetrix expression data are biased by G-quadruplex formation Shanahan, Hugh P. Memon, Farhat N. Upton, Graham J. G. Harrison, Andrew P. Nucleic Acids Res Computational Biology Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal. Oxford University Press 2012-04 2011-12-22 /pmc/articles/PMC3333884/ /pubmed/22199258 http://dx.doi.org/10.1093/nar/gkr1230 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Shanahan, Hugh P.
Memon, Farhat N.
Upton, Graham J. G.
Harrison, Andrew P.
Normalized Affymetrix expression data are biased by G-quadruplex formation
title Normalized Affymetrix expression data are biased by G-quadruplex formation
title_full Normalized Affymetrix expression data are biased by G-quadruplex formation
title_fullStr Normalized Affymetrix expression data are biased by G-quadruplex formation
title_full_unstemmed Normalized Affymetrix expression data are biased by G-quadruplex formation
title_short Normalized Affymetrix expression data are biased by G-quadruplex formation
title_sort normalized affymetrix expression data are biased by g-quadruplex formation
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3333884/
https://www.ncbi.nlm.nih.gov/pubmed/22199258
http://dx.doi.org/10.1093/nar/gkr1230
work_keys_str_mv AT shanahanhughp normalizedaffymetrixexpressiondataarebiasedbygquadruplexformation
AT memonfarhatn normalizedaffymetrixexpressiondataarebiasedbygquadruplexformation
AT uptongrahamjg normalizedaffymetrixexpressiondataarebiasedbygquadruplexformation
AT harrisonandrewp normalizedaffymetrixexpressiondataarebiasedbygquadruplexformation