Cargando…

A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements

BACKGROUND: High-throughput sequencing is becoming the standard tool for investigating protein-DNA interactions or epigenetic modifications. However, the data generated will always contain noise due to e.g. repetitive regions or non-specific antibody interactions. The noise will appear in the form o...

Descripción completa

Detalles Bibliográficos
Autores principales: Enroth, Stefan, Andersson, Claes R, Andersson, Robin, Wadelius, Claes, Gustafsson, Mats G, Komorowski, Jan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371767/
https://www.ncbi.nlm.nih.gov/pubmed/22248020
http://dx.doi.org/10.1186/1748-7188-7-2
_version_ 1782235253788639232
author Enroth, Stefan
Andersson, Claes R
Andersson, Robin
Wadelius, Claes
Gustafsson, Mats G
Komorowski, Jan
author_facet Enroth, Stefan
Andersson, Claes R
Andersson, Robin
Wadelius, Claes
Gustafsson, Mats G
Komorowski, Jan
author_sort Enroth, Stefan
collection PubMed
description BACKGROUND: High-throughput sequencing is becoming the standard tool for investigating protein-DNA interactions or epigenetic modifications. However, the data generated will always contain noise due to e.g. repetitive regions or non-specific antibody interactions. The noise will appear in the form of a background distribution of reads that must be taken into account in the downstream analysis, for example when detecting enriched regions (peak-calling). Several reported peak-callers can take experimental measurements of background tag distribution into account when analysing a data set. Unfortunately, the background is only used to adjust peak calling and not as a pre-processing step that aims at discerning the signal from the background noise. A normalization procedure that extracts the signal of interest would be of universal use when investigating genomic patterns. RESULTS: We formulated such a normalization method based on linear regression and made a proof-of-concept implementation in R and C++. It was tested on simulated as well as on publicly available ChIP-seq data on binding sites for two transcription factors, MAX and FOXA1 and two control samples, Input and IgG. We applied three different peak-callers to (i) raw (un-normalized) data using statistical background models and (ii) raw data with control samples as background and (iii) normalized data without additional control samples as background. The fraction of called regions containing the expected transcription factor binding motif was largest for the normalized data and evaluation with qPCR data for FOXA1 suggested higher sensitivity and specificity using normalized data over raw data with experimental background. CONCLUSIONS: The proposed method can handle several control samples allowing for correction of multiple sources of bias simultaneously. Our evaluation on both synthetic and experimental data suggests that the method is successful in removing background noise.
format Online
Article
Text
id pubmed-3371767
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33717672012-06-13 A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements Enroth, Stefan Andersson, Claes R Andersson, Robin Wadelius, Claes Gustafsson, Mats G Komorowski, Jan Algorithms Mol Biol Research BACKGROUND: High-throughput sequencing is becoming the standard tool for investigating protein-DNA interactions or epigenetic modifications. However, the data generated will always contain noise due to e.g. repetitive regions or non-specific antibody interactions. The noise will appear in the form of a background distribution of reads that must be taken into account in the downstream analysis, for example when detecting enriched regions (peak-calling). Several reported peak-callers can take experimental measurements of background tag distribution into account when analysing a data set. Unfortunately, the background is only used to adjust peak calling and not as a pre-processing step that aims at discerning the signal from the background noise. A normalization procedure that extracts the signal of interest would be of universal use when investigating genomic patterns. RESULTS: We formulated such a normalization method based on linear regression and made a proof-of-concept implementation in R and C++. It was tested on simulated as well as on publicly available ChIP-seq data on binding sites for two transcription factors, MAX and FOXA1 and two control samples, Input and IgG. We applied three different peak-callers to (i) raw (un-normalized) data using statistical background models and (ii) raw data with control samples as background and (iii) normalized data without additional control samples as background. The fraction of called regions containing the expected transcription factor binding motif was largest for the normalized data and evaluation with qPCR data for FOXA1 suggested higher sensitivity and specificity using normalized data over raw data with experimental background. CONCLUSIONS: The proposed method can handle several control samples allowing for correction of multiple sources of bias simultaneously. Our evaluation on both synthetic and experimental data suggests that the method is successful in removing background noise. BioMed Central 2012-01-16 /pmc/articles/PMC3371767/ /pubmed/22248020 http://dx.doi.org/10.1186/1748-7188-7-2 Text en Copyright ©2012 Enroth et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Enroth, Stefan
Andersson, Claes R
Andersson, Robin
Wadelius, Claes
Gustafsson, Mats G
Komorowski, Jan
A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements
title A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements
title_full A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements
title_fullStr A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements
title_full_unstemmed A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements
title_short A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements
title_sort strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371767/
https://www.ncbi.nlm.nih.gov/pubmed/22248020
http://dx.doi.org/10.1186/1748-7188-7-2
work_keys_str_mv AT enrothstefan astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements
AT anderssonclaesr astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements
AT anderssonrobin astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements
AT wadeliusclaes astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements
AT gustafssonmatsg astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements
AT komorowskijan astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements
AT enrothstefan strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements
AT anderssonclaesr strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements
AT anderssonrobin strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements
AT wadeliusclaes strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements
AT gustafssonmatsg strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements
AT komorowskijan strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements