Cargando…

BayesPeak: Bayesian analysis of ChIP-seq data

BACKGROUND: High-throughput sequencing technology has become popular and widely used to study protein and DNA interactions. Chromatin immunoprecipitation, followed by sequencing of the resulting samples, produces large amounts of data that can be used to map genomic features such as transcription fa...

Descripción completa

Detalles Bibliográficos
Autores principales: Spyrou, Christiana, Stark, Rory, Lynch, Andy G, Tavaré, Simon
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760534/
https://www.ncbi.nlm.nih.gov/pubmed/19772557
http://dx.doi.org/10.1186/1471-2105-10-299
_version_ 1782172752457760768
author Spyrou, Christiana
Stark, Rory
Lynch, Andy G
Tavaré, Simon
author_facet Spyrou, Christiana
Stark, Rory
Lynch, Andy G
Tavaré, Simon
author_sort Spyrou, Christiana
collection PubMed
description BACKGROUND: High-throughput sequencing technology has become popular and widely used to study protein and DNA interactions. Chromatin immunoprecipitation, followed by sequencing of the resulting samples, produces large amounts of data that can be used to map genomic features such as transcription factor binding sites and histone modifications. METHODS: Our proposed statistical algorithm, BayesPeak, uses a fully Bayesian hidden Markov model to detect enriched locations in the genome. The structure accommodates the natural features of the Solexa/Illumina sequencing data and allows for overdispersion in the abundance of reads in different regions. Moreover, a control sample can be incorporated in the analysis to account for experimental and sequence biases. Markov chain Monte Carlo algorithms are applied to estimate the posterior distributions of the model parameters, and posterior probabilities are used to detect the sites of interest. CONCLUSION: We have presented a flexible approach for identifying peaks from ChIP-seq reads, suitable for use on both transcription factor binding and histone modification data. Our method estimates probabilities of enrichment that can be used in downstream analysis. The method is assessed using experimentally verified data and is shown to provide high-confidence calls with low false positive rates.
format Text
id pubmed-2760534
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27605342009-10-13 BayesPeak: Bayesian analysis of ChIP-seq data Spyrou, Christiana Stark, Rory Lynch, Andy G Tavaré, Simon BMC Bioinformatics Research Article BACKGROUND: High-throughput sequencing technology has become popular and widely used to study protein and DNA interactions. Chromatin immunoprecipitation, followed by sequencing of the resulting samples, produces large amounts of data that can be used to map genomic features such as transcription factor binding sites and histone modifications. METHODS: Our proposed statistical algorithm, BayesPeak, uses a fully Bayesian hidden Markov model to detect enriched locations in the genome. The structure accommodates the natural features of the Solexa/Illumina sequencing data and allows for overdispersion in the abundance of reads in different regions. Moreover, a control sample can be incorporated in the analysis to account for experimental and sequence biases. Markov chain Monte Carlo algorithms are applied to estimate the posterior distributions of the model parameters, and posterior probabilities are used to detect the sites of interest. CONCLUSION: We have presented a flexible approach for identifying peaks from ChIP-seq reads, suitable for use on both transcription factor binding and histone modification data. Our method estimates probabilities of enrichment that can be used in downstream analysis. The method is assessed using experimentally verified data and is shown to provide high-confidence calls with low false positive rates. BioMed Central 2009-09-21 /pmc/articles/PMC2760534/ /pubmed/19772557 http://dx.doi.org/10.1186/1471-2105-10-299 Text en Copyright © 2009 Spyrou et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Spyrou, Christiana
Stark, Rory
Lynch, Andy G
Tavaré, Simon
BayesPeak: Bayesian analysis of ChIP-seq data
title BayesPeak: Bayesian analysis of ChIP-seq data
title_full BayesPeak: Bayesian analysis of ChIP-seq data
title_fullStr BayesPeak: Bayesian analysis of ChIP-seq data
title_full_unstemmed BayesPeak: Bayesian analysis of ChIP-seq data
title_short BayesPeak: Bayesian analysis of ChIP-seq data
title_sort bayespeak: bayesian analysis of chip-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760534/
https://www.ncbi.nlm.nih.gov/pubmed/19772557
http://dx.doi.org/10.1186/1471-2105-10-299
work_keys_str_mv AT spyrouchristiana bayespeakbayesiananalysisofchipseqdata
AT starkrory bayespeakbayesiananalysisofchipseqdata
AT lynchandyg bayespeakbayesiananalysisofchipseqdata
AT tavaresimon bayespeakbayesiananalysisofchipseqdata