Cargando…

HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data

BACKGROUND: Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that coupl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Qin, Zhaohui S, Yu, Jianjun, Shen, Jincheng, Maher, Christopher A, Hu, Ming, Kalyana-Sundaram, Shanker, Yu, Jindan, Chinnaiyan, Arul M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2912305/ https://www.ncbi.nlm.nih.gov/pubmed/20598134 http://dx.doi.org/10.1186/1471-2105-11-369

_version_	1782184576974585856
author	Qin, Zhaohui S Yu, Jianjun Shen, Jincheng Maher, Christopher A Hu, Ming Kalyana-Sundaram, Shanker Yu, Jindan Chinnaiyan, Arul M
author_facet	Qin, Zhaohui S Yu, Jianjun Shen, Jincheng Maher, Christopher A Hu, Ming Kalyana-Sundaram, Shanker Yu, Jindan Chinnaiyan, Arul M
author_sort	Qin, Zhaohui S
collection	PubMed
description	BACKGROUND: Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, (ChIP-Seq). This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method. RESULTS: Here we introduce HPeak, a Hidden Markov model (HMM)-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. In contrast to the majority of available ChIP-Seq analysis software packages, HPeak is a model-based approach allowing for rigorous statistical inference. This approach enables HPeak to accurately infer genomic regions enriched with sequence reads by assuming realistic probability distributions, in conjunction with a novel weighting scheme on the sequencing read coverage. CONCLUSIONS: Using biologically relevant data collections, we found that HPeak showed a higher prevalence of the expected transcription factor binding motifs in ChIP-enriched sequences relative to the control sequences when compared to other currently available ChIP-Seq analysis approaches. Additionally, in comparison to the ChIP-chip assay, ChIP-Seq provides higher resolution along with improved sensitivity and specificity of binding site detection. Additional file and the HPeak program are freely available at http://www.sph.umich.edu/csg/qin/HPeak.
format	Text
id	pubmed-2912305
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-29123052010-07-30 HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data Qin, Zhaohui S Yu, Jianjun Shen, Jincheng Maher, Christopher A Hu, Ming Kalyana-Sundaram, Shanker Yu, Jindan Chinnaiyan, Arul M BMC Bioinformatics Methodology Article BACKGROUND: Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, (ChIP-Seq). This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method. RESULTS: Here we introduce HPeak, a Hidden Markov model (HMM)-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. In contrast to the majority of available ChIP-Seq analysis software packages, HPeak is a model-based approach allowing for rigorous statistical inference. This approach enables HPeak to accurately infer genomic regions enriched with sequence reads by assuming realistic probability distributions, in conjunction with a novel weighting scheme on the sequencing read coverage. CONCLUSIONS: Using biologically relevant data collections, we found that HPeak showed a higher prevalence of the expected transcription factor binding motifs in ChIP-enriched sequences relative to the control sequences when compared to other currently available ChIP-Seq analysis approaches. Additionally, in comparison to the ChIP-chip assay, ChIP-Seq provides higher resolution along with improved sensitivity and specificity of binding site detection. Additional file and the HPeak program are freely available at http://www.sph.umich.edu/csg/qin/HPeak. BioMed Central 2010-07-02 /pmc/articles/PMC2912305/ /pubmed/20598134 http://dx.doi.org/10.1186/1471-2105-11-369 Text en Copyright ©2010 Qin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Qin, Zhaohui S Yu, Jianjun Shen, Jincheng Maher, Christopher A Hu, Ming Kalyana-Sundaram, Shanker Yu, Jindan Chinnaiyan, Arul M HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data
title	HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data
title_full	HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data
title_fullStr	HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data
title_full_unstemmed	HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data
title_short	HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data
title_sort	hpeak: an hmm-based algorithm for defining read-enriched regions in chip-seq data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2912305/ https://www.ncbi.nlm.nih.gov/pubmed/20598134 http://dx.doi.org/10.1186/1471-2105-11-369
work_keys_str_mv	AT qinzhaohuis hpeakanhmmbasedalgorithmfordefiningreadenrichedregionsinchipseqdata AT yujianjun hpeakanhmmbasedalgorithmfordefiningreadenrichedregionsinchipseqdata AT shenjincheng hpeakanhmmbasedalgorithmfordefiningreadenrichedregionsinchipseqdata AT maherchristophera hpeakanhmmbasedalgorithmfordefiningreadenrichedregionsinchipseqdata AT huming hpeakanhmmbasedalgorithmfordefiningreadenrichedregionsinchipseqdata AT kalyanasundaramshanker hpeakanhmmbasedalgorithmfordefiningreadenrichedregionsinchipseqdata AT yujindan hpeakanhmmbasedalgorithmfordefiningreadenrichedregionsinchipseqdata AT chinnaiyanarulm hpeakanhmmbasedalgorithmfordefiningreadenrichedregionsinchipseqdata

HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data

Ejemplares similares