Cargando…

Tiling array data analysis: a multiscale approach using wavelets

BACKGROUND: Tiling array data is hard to interpret due to noise. The wavelet transformation is a widely used technique in signal processing for elucidating the true signal from noisy data. Consequently, we attempted to denoise representative tiling array datasets for ChIP-chip experiments using wave...

Descripción completa

Detalles Bibliográficos
Autores principales: Karpikov, Alexander, Rozowsky, Joel, Gerstein, Mark
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3055839/
https://www.ncbi.nlm.nih.gov/pubmed/21338513
http://dx.doi.org/10.1186/1471-2105-12-57
_version_ 1782200147682263040
author Karpikov, Alexander
Rozowsky, Joel
Gerstein, Mark
author_facet Karpikov, Alexander
Rozowsky, Joel
Gerstein, Mark
author_sort Karpikov, Alexander
collection PubMed
description BACKGROUND: Tiling array data is hard to interpret due to noise. The wavelet transformation is a widely used technique in signal processing for elucidating the true signal from noisy data. Consequently, we attempted to denoise representative tiling array datasets for ChIP-chip experiments using wavelets. In doing this, we used specific wavelet basis functions, Coiflets, since their triangular shape closely resembles the expected profiles of true ChIP-chip peaks. RESULTS: In our wavelet-transformed data, we observed that noise tends to be confined to small scales while the useful signal-of-interest spans multiple large scales. We were also able to show that wavelet coefficients due to non-specific cross-hybridization follow a log-normal distribution, and we used this fact in developing a thresholding procedure. In particular, wavelets allow one to set an unambiguous, absolute threshold, which has been hard to define in ChIP-chip experiments. One can set this threshold by requiring a similar confidence level at different length-scales of the transformed signal. We applied our algorithm to a number of representative ChIP-chip data sets, including those of Pol II and histone modifications, which have a diverse distribution of length-scales of biochemical activity, including some broad peaks. CONCLUSIONS: Finally, we benchmarked our method in comparison to other approaches for scoring ChIP-chip data using spike-ins on the ENCODE Nimblegen tiling array. This comparison demonstrated excellent performance, with wavelets getting the best overall score.
format Text
id pubmed-3055839
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30558392011-03-15 Tiling array data analysis: a multiscale approach using wavelets Karpikov, Alexander Rozowsky, Joel Gerstein, Mark BMC Bioinformatics Methodology Article BACKGROUND: Tiling array data is hard to interpret due to noise. The wavelet transformation is a widely used technique in signal processing for elucidating the true signal from noisy data. Consequently, we attempted to denoise representative tiling array datasets for ChIP-chip experiments using wavelets. In doing this, we used specific wavelet basis functions, Coiflets, since their triangular shape closely resembles the expected profiles of true ChIP-chip peaks. RESULTS: In our wavelet-transformed data, we observed that noise tends to be confined to small scales while the useful signal-of-interest spans multiple large scales. We were also able to show that wavelet coefficients due to non-specific cross-hybridization follow a log-normal distribution, and we used this fact in developing a thresholding procedure. In particular, wavelets allow one to set an unambiguous, absolute threshold, which has been hard to define in ChIP-chip experiments. One can set this threshold by requiring a similar confidence level at different length-scales of the transformed signal. We applied our algorithm to a number of representative ChIP-chip data sets, including those of Pol II and histone modifications, which have a diverse distribution of length-scales of biochemical activity, including some broad peaks. CONCLUSIONS: Finally, we benchmarked our method in comparison to other approaches for scoring ChIP-chip data using spike-ins on the ENCODE Nimblegen tiling array. This comparison demonstrated excellent performance, with wavelets getting the best overall score. BioMed Central 2011-02-21 /pmc/articles/PMC3055839/ /pubmed/21338513 http://dx.doi.org/10.1186/1471-2105-12-57 Text en Copyright ©2011 Karpikov et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Karpikov, Alexander
Rozowsky, Joel
Gerstein, Mark
Tiling array data analysis: a multiscale approach using wavelets
title Tiling array data analysis: a multiscale approach using wavelets
title_full Tiling array data analysis: a multiscale approach using wavelets
title_fullStr Tiling array data analysis: a multiscale approach using wavelets
title_full_unstemmed Tiling array data analysis: a multiscale approach using wavelets
title_short Tiling array data analysis: a multiscale approach using wavelets
title_sort tiling array data analysis: a multiscale approach using wavelets
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3055839/
https://www.ncbi.nlm.nih.gov/pubmed/21338513
http://dx.doi.org/10.1186/1471-2105-12-57
work_keys_str_mv AT karpikovalexander tilingarraydataanalysisamultiscaleapproachusingwavelets
AT rozowskyjoel tilingarraydataanalysisamultiscaleapproachusingwavelets
AT gersteinmark tilingarraydataanalysisamultiscaleapproachusingwavelets