Cargando…

Modeling ChIP Sequencing In Silico with Applications

ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Zhengdong D., Rozowsky, Joel, Snyder, Michael, Chang, Joseph, Gerstein, Mark
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2507756/ https://www.ncbi.nlm.nih.gov/pubmed/18725927 http://dx.doi.org/10.1371/journal.pcbi.1000158

_version_	1782158393398525952
author	Zhang, Zhengdong D. Rozowsky, Joel Snyder, Michael Chang, Joseph Gerstein, Mark
author_facet	Zhang, Zhengdong D. Rozowsky, Joel Snyder, Michael Chang, Joseph Gerstein, Mark
author_sort	Zhang, Zhengdong D.
collection	PubMed
description	ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion.
format	Text
id	pubmed-2507756
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-25077562008-08-22 Modeling ChIP Sequencing In Silico with Applications Zhang, Zhengdong D. Rozowsky, Joel Snyder, Michael Chang, Joseph Gerstein, Mark PLoS Comput Biol Research Article ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion. Public Library of Science 2008-08-22 /pmc/articles/PMC2507756/ /pubmed/18725927 http://dx.doi.org/10.1371/journal.pcbi.1000158 Text en Zhang et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Zhang, Zhengdong D. Rozowsky, Joel Snyder, Michael Chang, Joseph Gerstein, Mark Modeling ChIP Sequencing In Silico with Applications
title	Modeling ChIP Sequencing In Silico with Applications
title_full	Modeling ChIP Sequencing In Silico with Applications
title_fullStr	Modeling ChIP Sequencing In Silico with Applications
title_full_unstemmed	Modeling ChIP Sequencing In Silico with Applications
title_short	Modeling ChIP Sequencing In Silico with Applications
title_sort	modeling chip sequencing in silico with applications
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2507756/ https://www.ncbi.nlm.nih.gov/pubmed/18725927 http://dx.doi.org/10.1371/journal.pcbi.1000158
work_keys_str_mv	AT zhangzhengdongd modelingchipsequencinginsilicowithapplications AT rozowskyjoel modelingchipsequencinginsilicowithapplications AT snydermichael modelingchipsequencinginsilicowithapplications AT changjoseph modelingchipsequencinginsilicowithapplications AT gersteinmark modelingchipsequencinginsilicowithapplications

Modeling ChIP Sequencing In Silico with Applications

Ejemplares similares