Cargando…

A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data

BACKGROUND: RNA polymerase II (PolII) is essential in gene transcription and ChIP-seq experiments have been used to study PolII binding patterns over the entire genome. However, since PolII enriched regions in the genome can be very long, existing peak finding algorithms for ChIP-seq data are not ad...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, Zhi, Tian, Lu, Pécot, Thierry, Huang, Tim, Machiraju, Raghu, Huang, Kun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3375632/
https://www.ncbi.nlm.nih.gov/pubmed/22536865
http://dx.doi.org/10.1186/1471-2105-13-S2-S2
_version_ 1782235770667401216
author Han, Zhi
Tian, Lu
Pécot, Thierry
Huang, Tim
Machiraju, Raghu
Huang, Kun
author_facet Han, Zhi
Tian, Lu
Pécot, Thierry
Huang, Tim
Machiraju, Raghu
Huang, Kun
author_sort Han, Zhi
collection PubMed
description BACKGROUND: RNA polymerase II (PolII) is essential in gene transcription and ChIP-seq experiments have been used to study PolII binding patterns over the entire genome. However, since PolII enriched regions in the genome can be very long, existing peak finding algorithms for ChIP-seq data are not adequate for identifying such long regions. METHODS: Here we propose an enriched region detection method for ChIP-seq data to identify long enriched regions by combining a signal denoising algorithm with a false discovery rate (FDR) approach. The binned ChIP-seq data for PolII are first processed using a non-local means (NL-means) algorithm for purposes of denoising. Then, a FDR approach is developed to determine the threshold for marking enriched regions in the binned histogram. RESULTS: We first test our method using a public PolII ChIP-seq dataset and compare our results with published results obtained using the published algorithm HPeak. Our results show a high consistency with the published results (80-100%). Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7. The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets. Specifically, pertaining to MCF7 control samples we identified 5,911 segments with length of at least 4 Kbp (maximum 233,000 bp); and in MCF7 treated with E2 samples, we identified 6,200 such segments (maximum 325,000 bp). CONCLUSIONS: We demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics. Our method complements existing peak detection algorithms for ChIP-seq experiments.
format Online
Article
Text
id pubmed-3375632
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33756322012-06-18 A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data Han, Zhi Tian, Lu Pécot, Thierry Huang, Tim Machiraju, Raghu Huang, Kun BMC Bioinformatics Proceedings BACKGROUND: RNA polymerase II (PolII) is essential in gene transcription and ChIP-seq experiments have been used to study PolII binding patterns over the entire genome. However, since PolII enriched regions in the genome can be very long, existing peak finding algorithms for ChIP-seq data are not adequate for identifying such long regions. METHODS: Here we propose an enriched region detection method for ChIP-seq data to identify long enriched regions by combining a signal denoising algorithm with a false discovery rate (FDR) approach. The binned ChIP-seq data for PolII are first processed using a non-local means (NL-means) algorithm for purposes of denoising. Then, a FDR approach is developed to determine the threshold for marking enriched regions in the binned histogram. RESULTS: We first test our method using a public PolII ChIP-seq dataset and compare our results with published results obtained using the published algorithm HPeak. Our results show a high consistency with the published results (80-100%). Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7. The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets. Specifically, pertaining to MCF7 control samples we identified 5,911 segments with length of at least 4 Kbp (maximum 233,000 bp); and in MCF7 treated with E2 samples, we identified 6,200 such segments (maximum 325,000 bp). CONCLUSIONS: We demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics. Our method complements existing peak detection algorithms for ChIP-seq experiments. BioMed Central 2012-03-13 /pmc/articles/PMC3375632/ /pubmed/22536865 http://dx.doi.org/10.1186/1471-2105-13-S2-S2 Text en Copyright ©2012 Han et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Han, Zhi
Tian, Lu
Pécot, Thierry
Huang, Tim
Machiraju, Raghu
Huang, Kun
A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data
title A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data
title_full A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data
title_fullStr A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data
title_full_unstemmed A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data
title_short A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data
title_sort signal processing approach for enriched region detection in rna polymerase ii chip-seq data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3375632/
https://www.ncbi.nlm.nih.gov/pubmed/22536865
http://dx.doi.org/10.1186/1471-2105-13-S2-S2
work_keys_str_mv AT hanzhi asignalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata
AT tianlu asignalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata
AT pecotthierry asignalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata
AT huangtim asignalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata
AT machirajuraghu asignalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata
AT huangkun asignalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata
AT hanzhi signalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata
AT tianlu signalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata
AT pecotthierry signalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata
AT huangtim signalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata
AT machirajuraghu signalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata
AT huangkun signalprocessingapproachforenrichedregiondetectioninrnapolymeraseiichipseqdata