Cargando…

Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset

BACKGROUND: As more methods are developed to analyze RNA-profiling data, assessing their performance using control datasets becomes increasingly important. RESULTS: We present a 'spike-in' experiment for Affymetrix GeneChips that provides a defined dataset of 3,860 RNA species, which we us...

Descripción completa

Detalles Bibliográficos
Autores principales: Choe, Sung E, Boutros, Michael, Michelson, Alan M, Church, George M, Halfon, Marc S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC551536/
https://www.ncbi.nlm.nih.gov/pubmed/15693945
http://dx.doi.org/10.1186/gb-2005-6-2-r16
_version_ 1782122462643748864
author Choe, Sung E
Boutros, Michael
Michelson, Alan M
Church, George M
Halfon, Marc S
author_facet Choe, Sung E
Boutros, Michael
Michelson, Alan M
Church, George M
Halfon, Marc S
author_sort Choe, Sung E
collection PubMed
description BACKGROUND: As more methods are developed to analyze RNA-profiling data, assessing their performance using control datasets becomes increasingly important. RESULTS: We present a 'spike-in' experiment for Affymetrix GeneChips that provides a defined dataset of 3,860 RNA species, which we use to evaluate analysis options for identifying differentially expressed genes. The experimental design incorporates two novel features. First, to obtain accurate estimates of false-positive and false-negative rates, 100-200 RNAs are spiked in at each fold-change level of interest, ranging from 1.2 to 4-fold. Second, instead of using an uncharacterized background RNA sample, a set of 2,551 RNA species is used as the constant (1x) set, allowing us to know whether any given probe set is truly present or absent. Application of a large number of analysis methods to this dataset reveals clear variation in their ability to identify differentially expressed genes. False-negative and false-positive rates are minimized when the following options are chosen: subtracting nonspecific signal from the PM probe intensities; performing an intensity-dependent normalization at the probe set level; and incorporating a signal intensity-dependent standard deviation in the test statistic. CONCLUSIONS: A best-route combination of analysis methods is presented that allows detection of approximately 70% of true positives before reaching a 10% false-discovery rate. We highlight areas in need of improvement, including better estimate of false-discovery rates and decreased false-negative rates.
format Text
id pubmed-551536
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5515362005-03-03 Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset Choe, Sung E Boutros, Michael Michelson, Alan M Church, George M Halfon, Marc S Genome Biol Research BACKGROUND: As more methods are developed to analyze RNA-profiling data, assessing their performance using control datasets becomes increasingly important. RESULTS: We present a 'spike-in' experiment for Affymetrix GeneChips that provides a defined dataset of 3,860 RNA species, which we use to evaluate analysis options for identifying differentially expressed genes. The experimental design incorporates two novel features. First, to obtain accurate estimates of false-positive and false-negative rates, 100-200 RNAs are spiked in at each fold-change level of interest, ranging from 1.2 to 4-fold. Second, instead of using an uncharacterized background RNA sample, a set of 2,551 RNA species is used as the constant (1x) set, allowing us to know whether any given probe set is truly present or absent. Application of a large number of analysis methods to this dataset reveals clear variation in their ability to identify differentially expressed genes. False-negative and false-positive rates are minimized when the following options are chosen: subtracting nonspecific signal from the PM probe intensities; performing an intensity-dependent normalization at the probe set level; and incorporating a signal intensity-dependent standard deviation in the test statistic. CONCLUSIONS: A best-route combination of analysis methods is presented that allows detection of approximately 70% of true positives before reaching a 10% false-discovery rate. We highlight areas in need of improvement, including better estimate of false-discovery rates and decreased false-negative rates. BioMed Central 2005 2005-01-28 /pmc/articles/PMC551536/ /pubmed/15693945 http://dx.doi.org/10.1186/gb-2005-6-2-r16 Text en Copyright © 2005 Choe et al.; licensee BioMed Central Ltd.
spellingShingle Research
Choe, Sung E
Boutros, Michael
Michelson, Alan M
Church, George M
Halfon, Marc S
Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset
title Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset
title_full Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset
title_fullStr Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset
title_full_unstemmed Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset
title_short Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset
title_sort preferred analysis methods for affymetrix genechips revealed by a wholly defined control dataset
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC551536/
https://www.ncbi.nlm.nih.gov/pubmed/15693945
http://dx.doi.org/10.1186/gb-2005-6-2-r16
work_keys_str_mv AT choesunge preferredanalysismethodsforaffymetrixgenechipsrevealedbyawhollydefinedcontroldataset
AT boutrosmichael preferredanalysismethodsforaffymetrixgenechipsrevealedbyawhollydefinedcontroldataset
AT michelsonalanm preferredanalysismethodsforaffymetrixgenechipsrevealedbyawhollydefinedcontroldataset
AT churchgeorgem preferredanalysismethodsforaffymetrixgenechipsrevealedbyawhollydefinedcontroldataset
AT halfonmarcs preferredanalysismethodsforaffymetrixgenechipsrevealedbyawhollydefinedcontroldataset