Cargando…

Algorithm-driven Artifacts in median polish summarization of Microarray data

BACKGROUND: High-throughput measurement of transcript intensities using Affymetrix type oligonucleotide microarrays has produced a massive quantity of data during the last decade. Different preprocessing techniques exist to convert the raw signal intensities measured by these chips into gene express...

Descripción completa

Detalles Bibliográficos
Autores principales: Giorgi, Federico M, Bolger, Anthony M, Lohse, Marc, Usadel, Bjoern
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2998528/
https://www.ncbi.nlm.nih.gov/pubmed/21070630
http://dx.doi.org/10.1186/1471-2105-11-553
_version_ 1782193378975285248
author Giorgi, Federico M
Bolger, Anthony M
Lohse, Marc
Usadel, Bjoern
author_facet Giorgi, Federico M
Bolger, Anthony M
Lohse, Marc
Usadel, Bjoern
author_sort Giorgi, Federico M
collection PubMed
description BACKGROUND: High-throughput measurement of transcript intensities using Affymetrix type oligonucleotide microarrays has produced a massive quantity of data during the last decade. Different preprocessing techniques exist to convert the raw signal intensities measured by these chips into gene expression estimates. Although these techniques have been widely benchmarked in the context of differential gene expression analysis, there are only few examples where their performance has been assessed in respect to coexpression-based studies such as sample classification. RESULTS: In the present paper we benchmark the three most used normalization procedures (MAS5, RMA and GCRMA) in the context of inter-array correlation analysis, confirming and extending the finding that RMA and GCRMA consistently overestimate sample similarity upon normalization. We determine that median polish summarization is responsible for generating a large proportion of these over-similarity artifacts. Furthermore, we show that most affected probesets show also internal signal disagreement, and tend to be composed by individual probes hitting different gene transcripts. We finally provide a correction to the RMA/GCRMA summarization procedure that massively reduces inter-array correlation artifacts, without affecting the detection of differentially expressed genes. CONCLUSIONS: We propose tRMA as a modification of RMA to normalize microarray experiments for correlation-based analysis.
format Text
id pubmed-2998528
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29985282011-01-05 Algorithm-driven Artifacts in median polish summarization of Microarray data Giorgi, Federico M Bolger, Anthony M Lohse, Marc Usadel, Bjoern BMC Bioinformatics Research Article BACKGROUND: High-throughput measurement of transcript intensities using Affymetrix type oligonucleotide microarrays has produced a massive quantity of data during the last decade. Different preprocessing techniques exist to convert the raw signal intensities measured by these chips into gene expression estimates. Although these techniques have been widely benchmarked in the context of differential gene expression analysis, there are only few examples where their performance has been assessed in respect to coexpression-based studies such as sample classification. RESULTS: In the present paper we benchmark the three most used normalization procedures (MAS5, RMA and GCRMA) in the context of inter-array correlation analysis, confirming and extending the finding that RMA and GCRMA consistently overestimate sample similarity upon normalization. We determine that median polish summarization is responsible for generating a large proportion of these over-similarity artifacts. Furthermore, we show that most affected probesets show also internal signal disagreement, and tend to be composed by individual probes hitting different gene transcripts. We finally provide a correction to the RMA/GCRMA summarization procedure that massively reduces inter-array correlation artifacts, without affecting the detection of differentially expressed genes. CONCLUSIONS: We propose tRMA as a modification of RMA to normalize microarray experiments for correlation-based analysis. BioMed Central 2010-11-11 /pmc/articles/PMC2998528/ /pubmed/21070630 http://dx.doi.org/10.1186/1471-2105-11-553 Text en Copyright ©2010 Giorgi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Giorgi, Federico M
Bolger, Anthony M
Lohse, Marc
Usadel, Bjoern
Algorithm-driven Artifacts in median polish summarization of Microarray data
title Algorithm-driven Artifacts in median polish summarization of Microarray data
title_full Algorithm-driven Artifacts in median polish summarization of Microarray data
title_fullStr Algorithm-driven Artifacts in median polish summarization of Microarray data
title_full_unstemmed Algorithm-driven Artifacts in median polish summarization of Microarray data
title_short Algorithm-driven Artifacts in median polish summarization of Microarray data
title_sort algorithm-driven artifacts in median polish summarization of microarray data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2998528/
https://www.ncbi.nlm.nih.gov/pubmed/21070630
http://dx.doi.org/10.1186/1471-2105-11-553
work_keys_str_mv AT giorgifedericom algorithmdrivenartifactsinmedianpolishsummarizationofmicroarraydata
AT bolgeranthonym algorithmdrivenartifactsinmedianpolishsummarizationofmicroarraydata
AT lohsemarc algorithmdrivenartifactsinmedianpolishsummarizationofmicroarraydata
AT usadelbjoern algorithmdrivenartifactsinmedianpolishsummarizationofmicroarraydata