Cargando…

Considerations when using the significance analysis of microarrays (SAM) algorithm

BACKGROUND: Users of microarray technology typically strive to use universally acceptable data analysis strategies to determine significant expression changes in their experiments. One of the most frequently utilised methods for gene expression data analysis is SAM (significance analysis of microarr...

Descripción completa

Detalles Bibliográficos
Autores principales: Larsson, Ola, Wahlestedt, Claes, Timmons, James A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1173086/
https://www.ncbi.nlm.nih.gov/pubmed/15921534
http://dx.doi.org/10.1186/1471-2105-6-129
_version_ 1782124453277204480
author Larsson, Ola
Wahlestedt, Claes
Timmons, James A
author_facet Larsson, Ola
Wahlestedt, Claes
Timmons, James A
author_sort Larsson, Ola
collection PubMed
description BACKGROUND: Users of microarray technology typically strive to use universally acceptable data analysis strategies to determine significant expression changes in their experiments. One of the most frequently utilised methods for gene expression data analysis is SAM (significance analysis of microarrays). The impact of selection thresholds, on the output from SAM, may critically alter the conclusion of a study, yet this consideration has not been systematically evaluated in any publication. RESULTS: We have examined the effect of discrete data selection criteria (qualification criteria for inclusion) and response thresholds (out-put filtering) on the number of significant genes reported by SAM. The use of a reduced data set by applying arbitrary restrictions vis-à-vis abundance calls (e.g. from D-chip) or application of the fold change (FC) option within SAM (named the FC hurdle hereafter), can substantially alter the significant gene list when running SAM in Microsoft Excel. We determined that for a given final FC criteria (e.g. 1.5 fold change) the FC hurdle applied within Microsoft Excel SAM alters the number of reported genes above the final FC criteria. The reason is that the FC hurdle changes the composition of the control data set, such that a different significance level (q-value) is obtained for any given gene. This effect can be so large that it changes subsequent post hoc analysis interpretation, such as ontology overrepresentation analysis. CONCLUSION: Our results argue for caution when using SAM. All data sets analysed with SAM could be reanalysed taking into account the potential impact of the use of arbitrary thresholds to trim data sets before significance testing.
format Text
id pubmed-1173086
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-11730862005-07-07 Considerations when using the significance analysis of microarrays (SAM) algorithm Larsson, Ola Wahlestedt, Claes Timmons, James A BMC Bioinformatics Correspondence BACKGROUND: Users of microarray technology typically strive to use universally acceptable data analysis strategies to determine significant expression changes in their experiments. One of the most frequently utilised methods for gene expression data analysis is SAM (significance analysis of microarrays). The impact of selection thresholds, on the output from SAM, may critically alter the conclusion of a study, yet this consideration has not been systematically evaluated in any publication. RESULTS: We have examined the effect of discrete data selection criteria (qualification criteria for inclusion) and response thresholds (out-put filtering) on the number of significant genes reported by SAM. The use of a reduced data set by applying arbitrary restrictions vis-à-vis abundance calls (e.g. from D-chip) or application of the fold change (FC) option within SAM (named the FC hurdle hereafter), can substantially alter the significant gene list when running SAM in Microsoft Excel. We determined that for a given final FC criteria (e.g. 1.5 fold change) the FC hurdle applied within Microsoft Excel SAM alters the number of reported genes above the final FC criteria. The reason is that the FC hurdle changes the composition of the control data set, such that a different significance level (q-value) is obtained for any given gene. This effect can be so large that it changes subsequent post hoc analysis interpretation, such as ontology overrepresentation analysis. CONCLUSION: Our results argue for caution when using SAM. All data sets analysed with SAM could be reanalysed taking into account the potential impact of the use of arbitrary thresholds to trim data sets before significance testing. BioMed Central 2005-05-29 /pmc/articles/PMC1173086/ /pubmed/15921534 http://dx.doi.org/10.1186/1471-2105-6-129 Text en Copyright © 2005 Larsson et al; licensee BioMed Central Ltd.
spellingShingle Correspondence
Larsson, Ola
Wahlestedt, Claes
Timmons, James A
Considerations when using the significance analysis of microarrays (SAM) algorithm
title Considerations when using the significance analysis of microarrays (SAM) algorithm
title_full Considerations when using the significance analysis of microarrays (SAM) algorithm
title_fullStr Considerations when using the significance analysis of microarrays (SAM) algorithm
title_full_unstemmed Considerations when using the significance analysis of microarrays (SAM) algorithm
title_short Considerations when using the significance analysis of microarrays (SAM) algorithm
title_sort considerations when using the significance analysis of microarrays (sam) algorithm
topic Correspondence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1173086/
https://www.ncbi.nlm.nih.gov/pubmed/15921534
http://dx.doi.org/10.1186/1471-2105-6-129
work_keys_str_mv AT larssonola considerationswhenusingthesignificanceanalysisofmicroarrayssamalgorithm
AT wahlestedtclaes considerationswhenusingthesignificanceanalysisofmicroarrayssamalgorithm
AT timmonsjamesa considerationswhenusingthesignificanceanalysisofmicroarrayssamalgorithm