Cargando…
Tiling array data analysis: a multiscale approach using wavelets
BACKGROUND: Tiling array data is hard to interpret due to noise. The wavelet transformation is a widely used technique in signal processing for elucidating the true signal from noisy data. Consequently, we attempted to denoise representative tiling array datasets for ChIP-chip experiments using wave...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3055839/ https://www.ncbi.nlm.nih.gov/pubmed/21338513 http://dx.doi.org/10.1186/1471-2105-12-57 |
_version_ | 1782200147682263040 |
---|---|
author | Karpikov, Alexander Rozowsky, Joel Gerstein, Mark |
author_facet | Karpikov, Alexander Rozowsky, Joel Gerstein, Mark |
author_sort | Karpikov, Alexander |
collection | PubMed |
description | BACKGROUND: Tiling array data is hard to interpret due to noise. The wavelet transformation is a widely used technique in signal processing for elucidating the true signal from noisy data. Consequently, we attempted to denoise representative tiling array datasets for ChIP-chip experiments using wavelets. In doing this, we used specific wavelet basis functions, Coiflets, since their triangular shape closely resembles the expected profiles of true ChIP-chip peaks. RESULTS: In our wavelet-transformed data, we observed that noise tends to be confined to small scales while the useful signal-of-interest spans multiple large scales. We were also able to show that wavelet coefficients due to non-specific cross-hybridization follow a log-normal distribution, and we used this fact in developing a thresholding procedure. In particular, wavelets allow one to set an unambiguous, absolute threshold, which has been hard to define in ChIP-chip experiments. One can set this threshold by requiring a similar confidence level at different length-scales of the transformed signal. We applied our algorithm to a number of representative ChIP-chip data sets, including those of Pol II and histone modifications, which have a diverse distribution of length-scales of biochemical activity, including some broad peaks. CONCLUSIONS: Finally, we benchmarked our method in comparison to other approaches for scoring ChIP-chip data using spike-ins on the ENCODE Nimblegen tiling array. This comparison demonstrated excellent performance, with wavelets getting the best overall score. |
format | Text |
id | pubmed-3055839 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30558392011-03-15 Tiling array data analysis: a multiscale approach using wavelets Karpikov, Alexander Rozowsky, Joel Gerstein, Mark BMC Bioinformatics Methodology Article BACKGROUND: Tiling array data is hard to interpret due to noise. The wavelet transformation is a widely used technique in signal processing for elucidating the true signal from noisy data. Consequently, we attempted to denoise representative tiling array datasets for ChIP-chip experiments using wavelets. In doing this, we used specific wavelet basis functions, Coiflets, since their triangular shape closely resembles the expected profiles of true ChIP-chip peaks. RESULTS: In our wavelet-transformed data, we observed that noise tends to be confined to small scales while the useful signal-of-interest spans multiple large scales. We were also able to show that wavelet coefficients due to non-specific cross-hybridization follow a log-normal distribution, and we used this fact in developing a thresholding procedure. In particular, wavelets allow one to set an unambiguous, absolute threshold, which has been hard to define in ChIP-chip experiments. One can set this threshold by requiring a similar confidence level at different length-scales of the transformed signal. We applied our algorithm to a number of representative ChIP-chip data sets, including those of Pol II and histone modifications, which have a diverse distribution of length-scales of biochemical activity, including some broad peaks. CONCLUSIONS: Finally, we benchmarked our method in comparison to other approaches for scoring ChIP-chip data using spike-ins on the ENCODE Nimblegen tiling array. This comparison demonstrated excellent performance, with wavelets getting the best overall score. BioMed Central 2011-02-21 /pmc/articles/PMC3055839/ /pubmed/21338513 http://dx.doi.org/10.1186/1471-2105-12-57 Text en Copyright ©2011 Karpikov et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Karpikov, Alexander Rozowsky, Joel Gerstein, Mark Tiling array data analysis: a multiscale approach using wavelets |
title | Tiling array data analysis: a multiscale approach using wavelets |
title_full | Tiling array data analysis: a multiscale approach using wavelets |
title_fullStr | Tiling array data analysis: a multiscale approach using wavelets |
title_full_unstemmed | Tiling array data analysis: a multiscale approach using wavelets |
title_short | Tiling array data analysis: a multiscale approach using wavelets |
title_sort | tiling array data analysis: a multiscale approach using wavelets |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3055839/ https://www.ncbi.nlm.nih.gov/pubmed/21338513 http://dx.doi.org/10.1186/1471-2105-12-57 |
work_keys_str_mv | AT karpikovalexander tilingarraydataanalysisamultiscaleapproachusingwavelets AT rozowskyjoel tilingarraydataanalysisamultiscaleapproachusingwavelets AT gersteinmark tilingarraydataanalysisamultiscaleapproachusingwavelets |