Cargando…
A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements
BACKGROUND: High-throughput sequencing is becoming the standard tool for investigating protein-DNA interactions or epigenetic modifications. However, the data generated will always contain noise due to e.g. repetitive regions or non-specific antibody interactions. The noise will appear in the form o...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371767/ https://www.ncbi.nlm.nih.gov/pubmed/22248020 http://dx.doi.org/10.1186/1748-7188-7-2 |
_version_ | 1782235253788639232 |
---|---|
author | Enroth, Stefan Andersson, Claes R Andersson, Robin Wadelius, Claes Gustafsson, Mats G Komorowski, Jan |
author_facet | Enroth, Stefan Andersson, Claes R Andersson, Robin Wadelius, Claes Gustafsson, Mats G Komorowski, Jan |
author_sort | Enroth, Stefan |
collection | PubMed |
description | BACKGROUND: High-throughput sequencing is becoming the standard tool for investigating protein-DNA interactions or epigenetic modifications. However, the data generated will always contain noise due to e.g. repetitive regions or non-specific antibody interactions. The noise will appear in the form of a background distribution of reads that must be taken into account in the downstream analysis, for example when detecting enriched regions (peak-calling). Several reported peak-callers can take experimental measurements of background tag distribution into account when analysing a data set. Unfortunately, the background is only used to adjust peak calling and not as a pre-processing step that aims at discerning the signal from the background noise. A normalization procedure that extracts the signal of interest would be of universal use when investigating genomic patterns. RESULTS: We formulated such a normalization method based on linear regression and made a proof-of-concept implementation in R and C++. It was tested on simulated as well as on publicly available ChIP-seq data on binding sites for two transcription factors, MAX and FOXA1 and two control samples, Input and IgG. We applied three different peak-callers to (i) raw (un-normalized) data using statistical background models and (ii) raw data with control samples as background and (iii) normalized data without additional control samples as background. The fraction of called regions containing the expected transcription factor binding motif was largest for the normalized data and evaluation with qPCR data for FOXA1 suggested higher sensitivity and specificity using normalized data over raw data with experimental background. CONCLUSIONS: The proposed method can handle several control samples allowing for correction of multiple sources of bias simultaneously. Our evaluation on both synthetic and experimental data suggests that the method is successful in removing background noise. |
format | Online Article Text |
id | pubmed-3371767 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-33717672012-06-13 A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements Enroth, Stefan Andersson, Claes R Andersson, Robin Wadelius, Claes Gustafsson, Mats G Komorowski, Jan Algorithms Mol Biol Research BACKGROUND: High-throughput sequencing is becoming the standard tool for investigating protein-DNA interactions or epigenetic modifications. However, the data generated will always contain noise due to e.g. repetitive regions or non-specific antibody interactions. The noise will appear in the form of a background distribution of reads that must be taken into account in the downstream analysis, for example when detecting enriched regions (peak-calling). Several reported peak-callers can take experimental measurements of background tag distribution into account when analysing a data set. Unfortunately, the background is only used to adjust peak calling and not as a pre-processing step that aims at discerning the signal from the background noise. A normalization procedure that extracts the signal of interest would be of universal use when investigating genomic patterns. RESULTS: We formulated such a normalization method based on linear regression and made a proof-of-concept implementation in R and C++. It was tested on simulated as well as on publicly available ChIP-seq data on binding sites for two transcription factors, MAX and FOXA1 and two control samples, Input and IgG. We applied three different peak-callers to (i) raw (un-normalized) data using statistical background models and (ii) raw data with control samples as background and (iii) normalized data without additional control samples as background. The fraction of called regions containing the expected transcription factor binding motif was largest for the normalized data and evaluation with qPCR data for FOXA1 suggested higher sensitivity and specificity using normalized data over raw data with experimental background. CONCLUSIONS: The proposed method can handle several control samples allowing for correction of multiple sources of bias simultaneously. Our evaluation on both synthetic and experimental data suggests that the method is successful in removing background noise. BioMed Central 2012-01-16 /pmc/articles/PMC3371767/ /pubmed/22248020 http://dx.doi.org/10.1186/1748-7188-7-2 Text en Copyright ©2012 Enroth et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Enroth, Stefan Andersson, Claes R Andersson, Robin Wadelius, Claes Gustafsson, Mats G Komorowski, Jan A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements |
title | A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements |
title_full | A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements |
title_fullStr | A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements |
title_full_unstemmed | A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements |
title_short | A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements |
title_sort | strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371767/ https://www.ncbi.nlm.nih.gov/pubmed/22248020 http://dx.doi.org/10.1186/1748-7188-7-2 |
work_keys_str_mv | AT enrothstefan astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements AT anderssonclaesr astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements AT anderssonrobin astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements AT wadeliusclaes astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements AT gustafssonmatsg astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements AT komorowskijan astrandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements AT enrothstefan strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements AT anderssonclaesr strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements AT anderssonrobin strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements AT wadeliusclaes strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements AT gustafssonmatsg strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements AT komorowskijan strandspecifichighresolutionnormalizationmethodforchipsequencingdataemployingmultipleexperimentalcontrolmeasurements |