Cargando…

A first principles approach to differential expression in microarray data analysis

BACKGROUND: The disparate results from the methods commonly used to determine differential expression in Affymetrix microarray experiments may well result from the wide variety of probe set and probe level models employed. Here we take the approach of making the fewest assumptions about the structur...

Descripción completa

Detalles Bibliográficos
Autor principal: Rubin, Robert A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2749840/
https://www.ncbi.nlm.nih.gov/pubmed/19758448
http://dx.doi.org/10.1186/1471-2105-10-292
_version_ 1782172191639470080
author Rubin, Robert A
author_facet Rubin, Robert A
author_sort Rubin, Robert A
collection PubMed
description BACKGROUND: The disparate results from the methods commonly used to determine differential expression in Affymetrix microarray experiments may well result from the wide variety of probe set and probe level models employed. Here we take the approach of making the fewest assumptions about the structure of the microarray data. Specifically, we only require that, under the null hypothesis that a gene is not differentially expressed for specified conditions, for any probe position in the gene's probe set: a) the probe amplitudes are independent and identically distributed over the conditions, and b) the distributions of the replicated probe amplitudes are amenable to classical analysis of variance (ANOVA). Log-amplitudes that have been standardized within-chip meet these conditions well enough for our approach, which is to perform ANOVA across conditions for each probe position, and then take the median of the resulting (1 - p) values as a gene-level measure of differential expression. RESULTS: We applied the technique to the HGU-133A, HG-U95A, and "Golden Spike" spike-in data sets. The resulting receiver operating characteristic (ROC) curves compared favorably with other published results. This procedure is quite sensitive, so much so that it has revealed the presence of probe sets that might properly be called "unanticipated positives" rather than "false positives", because plots of these probe sets strongly suggest that they are differentially expressed. CONCLUSION: The median ANOVA (1-p) approach presented here is a very simple methodology that does not depend on any specific probe level or probe models, and does not require any pre-processing other than within-chip standardization of probe level log amplitudes. Its performance is comparable to other published methods on the standard spike-in data sets, and has revealed the presence of new categories of probe sets that might properly be referred to as "unanticipated positives" and "unanticipated negatives" that need to be taken into account when using spiked-in data sets at "truthed" test beds.
format Text
id pubmed-2749840
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27498402009-09-24 A first principles approach to differential expression in microarray data analysis Rubin, Robert A BMC Bioinformatics Methodology Article BACKGROUND: The disparate results from the methods commonly used to determine differential expression in Affymetrix microarray experiments may well result from the wide variety of probe set and probe level models employed. Here we take the approach of making the fewest assumptions about the structure of the microarray data. Specifically, we only require that, under the null hypothesis that a gene is not differentially expressed for specified conditions, for any probe position in the gene's probe set: a) the probe amplitudes are independent and identically distributed over the conditions, and b) the distributions of the replicated probe amplitudes are amenable to classical analysis of variance (ANOVA). Log-amplitudes that have been standardized within-chip meet these conditions well enough for our approach, which is to perform ANOVA across conditions for each probe position, and then take the median of the resulting (1 - p) values as a gene-level measure of differential expression. RESULTS: We applied the technique to the HGU-133A, HG-U95A, and "Golden Spike" spike-in data sets. The resulting receiver operating characteristic (ROC) curves compared favorably with other published results. This procedure is quite sensitive, so much so that it has revealed the presence of probe sets that might properly be called "unanticipated positives" rather than "false positives", because plots of these probe sets strongly suggest that they are differentially expressed. CONCLUSION: The median ANOVA (1-p) approach presented here is a very simple methodology that does not depend on any specific probe level or probe models, and does not require any pre-processing other than within-chip standardization of probe level log amplitudes. Its performance is comparable to other published methods on the standard spike-in data sets, and has revealed the presence of new categories of probe sets that might properly be referred to as "unanticipated positives" and "unanticipated negatives" that need to be taken into account when using spiked-in data sets at "truthed" test beds. BioMed Central 2009-09-16 /pmc/articles/PMC2749840/ /pubmed/19758448 http://dx.doi.org/10.1186/1471-2105-10-292 Text en Copyright © 2009 Rubin; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Rubin, Robert A
A first principles approach to differential expression in microarray data analysis
title A first principles approach to differential expression in microarray data analysis
title_full A first principles approach to differential expression in microarray data analysis
title_fullStr A first principles approach to differential expression in microarray data analysis
title_full_unstemmed A first principles approach to differential expression in microarray data analysis
title_short A first principles approach to differential expression in microarray data analysis
title_sort first principles approach to differential expression in microarray data analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2749840/
https://www.ncbi.nlm.nih.gov/pubmed/19758448
http://dx.doi.org/10.1186/1471-2105-10-292
work_keys_str_mv AT rubinroberta afirstprinciplesapproachtodifferentialexpressioninmicroarraydataanalysis
AT rubinroberta firstprinciplesapproachtodifferentialexpressioninmicroarraydataanalysis