Cargando…

Estimation and correction of non-specific binding in a large-scale spike-in experiment

BACKGROUND: The availability of a recently published large-scale spike-in microarray dataset helps us to understand the influence of probe sequence in non-specific binding (NSB) signal and enables the benchmarking of several models for the estimation of NSB. In a typical microarray experiment using...

Descripción completa

Detalles Bibliográficos
Autores principales:	Schuster, Eugene F, Blanc, Eric, Partridge, Linda, Thornton, Janet M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2394775/ https://www.ncbi.nlm.nih.gov/pubmed/17594493 http://dx.doi.org/10.1186/gb-2007-8-6-r126

_version_	1782155447039426560
author	Schuster, Eugene F Blanc, Eric Partridge, Linda Thornton, Janet M
author_facet	Schuster, Eugene F Blanc, Eric Partridge, Linda Thornton, Janet M
author_sort	Schuster, Eugene F
collection	PubMed
description	BACKGROUND: The availability of a recently published large-scale spike-in microarray dataset helps us to understand the influence of probe sequence in non-specific binding (NSB) signal and enables the benchmarking of several models for the estimation of NSB. In a typical microarray experiment using Affymetrix whole genome chips, 30% to 50% of the probes will apparently have absent target transcripts and show only NSB signal, and these probes can have significant repercussions for normalization and the statistical analysis of the data if NSB is not estimated correctly. RESULTS: We have found that the MAS5 perfect match-mismatch (PM-MM) model is a poor model for estimation of NSB, and that the Naef and Zhang sequence-based models can reasonably estimate NSB. In general, using the GC robust multi-array average, which uses Naef binding affinities, to calculate NSB (GC-NSB) outperforms other methods for detecting differential expression. However, there is an intensity dependence of the best performing methods for generating probeset expression values. At low intensity, methods using GC-NSB outperform other methods, but at medium intensity, MAS5 PM-MM methods perform best, and at high intensity, MAS5 PM-MM and Zhang's position-dependent nearest-neighbor (PDNN) methods perform best. CONCLUSION: A combined statistical analysis using the MAS5 PM-MM, GC-NSB and PDNN methods to generate probeset values results in an improved ability to detect differential expression and estimates of false discovery rates compared with the individual methods. Additional improvements in detecting differential expression can be achieved by a strict elimination of empty probesets before normalization. However, there are still large gaps in our understanding of the Affymetrix GeneChip technology, and additional large-scale datasets, in which the concentration of each transcript is known, need to be produced before better models of specific binding can be created.
format	Text
id	pubmed-2394775
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-23947752008-05-24 Estimation and correction of non-specific binding in a large-scale spike-in experiment Schuster, Eugene F Blanc, Eric Partridge, Linda Thornton, Janet M Genome Biol Research BACKGROUND: The availability of a recently published large-scale spike-in microarray dataset helps us to understand the influence of probe sequence in non-specific binding (NSB) signal and enables the benchmarking of several models for the estimation of NSB. In a typical microarray experiment using Affymetrix whole genome chips, 30% to 50% of the probes will apparently have absent target transcripts and show only NSB signal, and these probes can have significant repercussions for normalization and the statistical analysis of the data if NSB is not estimated correctly. RESULTS: We have found that the MAS5 perfect match-mismatch (PM-MM) model is a poor model for estimation of NSB, and that the Naef and Zhang sequence-based models can reasonably estimate NSB. In general, using the GC robust multi-array average, which uses Naef binding affinities, to calculate NSB (GC-NSB) outperforms other methods for detecting differential expression. However, there is an intensity dependence of the best performing methods for generating probeset expression values. At low intensity, methods using GC-NSB outperform other methods, but at medium intensity, MAS5 PM-MM methods perform best, and at high intensity, MAS5 PM-MM and Zhang's position-dependent nearest-neighbor (PDNN) methods perform best. CONCLUSION: A combined statistical analysis using the MAS5 PM-MM, GC-NSB and PDNN methods to generate probeset values results in an improved ability to detect differential expression and estimates of false discovery rates compared with the individual methods. Additional improvements in detecting differential expression can be achieved by a strict elimination of empty probesets before normalization. However, there are still large gaps in our understanding of the Affymetrix GeneChip technology, and additional large-scale datasets, in which the concentration of each transcript is known, need to be produced before better models of specific binding can be created. BioMed Central 2007 2007-06-26 /pmc/articles/PMC2394775/ /pubmed/17594493 http://dx.doi.org/10.1186/gb-2007-8-6-r126 Text en Copyright © 2007 Schuster et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Schuster, Eugene F Blanc, Eric Partridge, Linda Thornton, Janet M Estimation and correction of non-specific binding in a large-scale spike-in experiment
title	Estimation and correction of non-specific binding in a large-scale spike-in experiment
title_full	Estimation and correction of non-specific binding in a large-scale spike-in experiment
title_fullStr	Estimation and correction of non-specific binding in a large-scale spike-in experiment
title_full_unstemmed	Estimation and correction of non-specific binding in a large-scale spike-in experiment
title_short	Estimation and correction of non-specific binding in a large-scale spike-in experiment
title_sort	estimation and correction of non-specific binding in a large-scale spike-in experiment
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2394775/ https://www.ncbi.nlm.nih.gov/pubmed/17594493 http://dx.doi.org/10.1186/gb-2007-8-6-r126
work_keys_str_mv	AT schustereugenef estimationandcorrectionofnonspecificbindinginalargescalespikeinexperiment AT blanceric estimationandcorrectionofnonspecificbindinginalargescalespikeinexperiment AT partridgelinda estimationandcorrectionofnonspecificbindinginalargescalespikeinexperiment AT thorntonjanetm estimationandcorrectionofnonspecificbindinginalargescalespikeinexperiment

Estimation and correction of non-specific binding in a large-scale spike-in experiment

Ejemplares similares