Cargando…

A white-box approach to microarray probe response characterization: the BaFL pipeline

BACKGROUND: Microarrays depend on appropriate probe design to deliver the promise of accurate genome-wide measurement. Probe design, ideally, produces a unique probe-target match with homogeneous duplex stability over the complete set of probes. Much of microarray pre-processing is concerned with ad...

Descripción completa

Detalles Bibliográficos
Autores principales: Thompson, Kevin J, Deshmukh, Hrishikesh, Solka, Jeffrey L, Weller, Jennifer W
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2804686/
https://www.ncbi.nlm.nih.gov/pubmed/20040098
http://dx.doi.org/10.1186/1471-2105-10-449
_version_ 1782176174529576960
author Thompson, Kevin J
Deshmukh, Hrishikesh
Solka, Jeffrey L
Weller, Jennifer W
author_facet Thompson, Kevin J
Deshmukh, Hrishikesh
Solka, Jeffrey L
Weller, Jennifer W
author_sort Thompson, Kevin J
collection PubMed
description BACKGROUND: Microarrays depend on appropriate probe design to deliver the promise of accurate genome-wide measurement. Probe design, ideally, produces a unique probe-target match with homogeneous duplex stability over the complete set of probes. Much of microarray pre-processing is concerned with adjusting for non-ideal probes that do not report target concentration accurately. Cross-hybridizing probes (non-unique), probe composition and structure, as well as platform effects such as instrument limitations, have been shown to affect the interpretation of signal. Data cleansing pipelines seldom filter specifically for these constraints, relying instead on general statistical tests to remove the most variable probes from the samples in a study. This adjusts probes contributing to ProbeSet (gene) values in a study-specific manner. We refer to the complete set of factors as biologically applied filter levels (BaFL) and have assembled an analysis pipeline for managing them consistently. The pipeline and associated experiments reported here examine the outcome of comprehensively excluding probes affected by known factors on inter-experiment target behavior consistency. RESULTS: We present here a 'white box' probe filtering and intensity transformation protocol that incorporates currently understood factors affecting probe and target interactions; the method has been tested on data from the Affymetrix human GeneChip HG-U95Av2, using two independent datasets from studies of a complex lung adenocarcinoma phenotype. The protocol incorporates probe-specific effects from SNPs, cross-hybridization and low heteroduplex affinity, as well as effects from scanner sensitivity, sample batches, and includes simple statistical tests for identifying unresolved biological factors leading to sample variability. Subsequent to filtering for these factors, the consistency and reliability of the remaining measurements is shown to be markedly improved. CONCLUSIONS: The data cleansing protocol yields reproducible estimates of a given probe or ProbeSet's (gene's) relative expression that translates across datasets, allowing for credible cross-experiment comparisons. We provide supporting evidence for the validity of removing several large classes of probes, and for our approaches for removing outlying samples. The resulting expression profiles demonstrate consistency across the two independent datasets. Finally, we demonstrate that, given an appropriate sampling pool, the method enhances the t-test's statistical power to discriminate significantly different means over sample classes.
format Text
id pubmed-2804686
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28046862010-01-12 A white-box approach to microarray probe response characterization: the BaFL pipeline Thompson, Kevin J Deshmukh, Hrishikesh Solka, Jeffrey L Weller, Jennifer W BMC Bioinformatics Methodology article BACKGROUND: Microarrays depend on appropriate probe design to deliver the promise of accurate genome-wide measurement. Probe design, ideally, produces a unique probe-target match with homogeneous duplex stability over the complete set of probes. Much of microarray pre-processing is concerned with adjusting for non-ideal probes that do not report target concentration accurately. Cross-hybridizing probes (non-unique), probe composition and structure, as well as platform effects such as instrument limitations, have been shown to affect the interpretation of signal. Data cleansing pipelines seldom filter specifically for these constraints, relying instead on general statistical tests to remove the most variable probes from the samples in a study. This adjusts probes contributing to ProbeSet (gene) values in a study-specific manner. We refer to the complete set of factors as biologically applied filter levels (BaFL) and have assembled an analysis pipeline for managing them consistently. The pipeline and associated experiments reported here examine the outcome of comprehensively excluding probes affected by known factors on inter-experiment target behavior consistency. RESULTS: We present here a 'white box' probe filtering and intensity transformation protocol that incorporates currently understood factors affecting probe and target interactions; the method has been tested on data from the Affymetrix human GeneChip HG-U95Av2, using two independent datasets from studies of a complex lung adenocarcinoma phenotype. The protocol incorporates probe-specific effects from SNPs, cross-hybridization and low heteroduplex affinity, as well as effects from scanner sensitivity, sample batches, and includes simple statistical tests for identifying unresolved biological factors leading to sample variability. Subsequent to filtering for these factors, the consistency and reliability of the remaining measurements is shown to be markedly improved. CONCLUSIONS: The data cleansing protocol yields reproducible estimates of a given probe or ProbeSet's (gene's) relative expression that translates across datasets, allowing for credible cross-experiment comparisons. We provide supporting evidence for the validity of removing several large classes of probes, and for our approaches for removing outlying samples. The resulting expression profiles demonstrate consistency across the two independent datasets. Finally, we demonstrate that, given an appropriate sampling pool, the method enhances the t-test's statistical power to discriminate significantly different means over sample classes. BioMed Central 2009-12-29 /pmc/articles/PMC2804686/ /pubmed/20040098 http://dx.doi.org/10.1186/1471-2105-10-449 Text en Copyright ©2009 Thompson et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology article
Thompson, Kevin J
Deshmukh, Hrishikesh
Solka, Jeffrey L
Weller, Jennifer W
A white-box approach to microarray probe response characterization: the BaFL pipeline
title A white-box approach to microarray probe response characterization: the BaFL pipeline
title_full A white-box approach to microarray probe response characterization: the BaFL pipeline
title_fullStr A white-box approach to microarray probe response characterization: the BaFL pipeline
title_full_unstemmed A white-box approach to microarray probe response characterization: the BaFL pipeline
title_short A white-box approach to microarray probe response characterization: the BaFL pipeline
title_sort white-box approach to microarray probe response characterization: the bafl pipeline
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2804686/
https://www.ncbi.nlm.nih.gov/pubmed/20040098
http://dx.doi.org/10.1186/1471-2105-10-449
work_keys_str_mv AT thompsonkevinj awhiteboxapproachtomicroarrayproberesponsecharacterizationthebaflpipeline
AT deshmukhhrishikesh awhiteboxapproachtomicroarrayproberesponsecharacterizationthebaflpipeline
AT solkajeffreyl awhiteboxapproachtomicroarrayproberesponsecharacterizationthebaflpipeline
AT wellerjenniferw awhiteboxapproachtomicroarrayproberesponsecharacterizationthebaflpipeline
AT thompsonkevinj whiteboxapproachtomicroarrayproberesponsecharacterizationthebaflpipeline
AT deshmukhhrishikesh whiteboxapproachtomicroarrayproberesponsecharacterizationthebaflpipeline
AT solkajeffreyl whiteboxapproachtomicroarrayproberesponsecharacterizationthebaflpipeline
AT wellerjenniferw whiteboxapproachtomicroarrayproberesponsecharacterizationthebaflpipeline