Cargando…

Features that define the best ChIP-seq peak calling algorithms

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an important tool for studying gene regulatory proteins, such as transcription factors and histones. Peak calling is one of the first steps in the analysis of these data. Peak calling consists of two sub-problems: identifying candida...

Descripción completa

Detalles Bibliográficos
Autores principales:	Thomas, Reuben, Thomas, Sean, Holloway, Alisha K, Pollard, Katherine S
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429005/ https://www.ncbi.nlm.nih.gov/pubmed/27169896 http://dx.doi.org/10.1093/bib/bbw035

_version_	1783235948192989184
author	Thomas, Reuben Thomas, Sean Holloway, Alisha K Pollard, Katherine S
author_facet	Thomas, Reuben Thomas, Sean Holloway, Alisha K Pollard, Katherine S
author_sort	Thomas, Reuben
collection	PubMed
description	Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an important tool for studying gene regulatory proteins, such as transcription factors and histones. Peak calling is one of the first steps in the analysis of these data. Peak calling consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance. We surveyed 30 methods and identified 12 features of the two sub-problems that distinguish methods from each other. We picked six methods GEM, MACS2, MUSIC, BCP, Threshold-based method (TM) and ZINBA] that span this feature space and used a combination of 300 simulated ChIP-seq data sets, 3 real data sets and mathematical analyses to identify features of methods that allow some to perform better than the others. We prove that methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not. Methods that use windows of different sizes are more powerful than the ones that do not. For statistical testing of candidate peaks, methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test. BCP and MACS2 have the best operating characteristics on simulated transcription factor binding data. GEM has the highest fraction of the top 500 peaks containing the binding motif of the immunoprecipitated factor, with 50% of its peaks within 10 base pairs of a motif. BCP and MUSIC perform best on histone data. These findings provide guidance and rationale for selecting the best peak caller for a given application.
format	Online Article Text
id	pubmed-5429005
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-54290052017-05-17 Features that define the best ChIP-seq peak calling algorithms Thomas, Reuben Thomas, Sean Holloway, Alisha K Pollard, Katherine S Brief Bioinform Papers Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an important tool for studying gene regulatory proteins, such as transcription factors and histones. Peak calling is one of the first steps in the analysis of these data. Peak calling consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance. We surveyed 30 methods and identified 12 features of the two sub-problems that distinguish methods from each other. We picked six methods GEM, MACS2, MUSIC, BCP, Threshold-based method (TM) and ZINBA] that span this feature space and used a combination of 300 simulated ChIP-seq data sets, 3 real data sets and mathematical analyses to identify features of methods that allow some to perform better than the others. We prove that methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not. Methods that use windows of different sizes are more powerful than the ones that do not. For statistical testing of candidate peaks, methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test. BCP and MACS2 have the best operating characteristics on simulated transcription factor binding data. GEM has the highest fraction of the top 500 peaks containing the binding motif of the immunoprecipitated factor, with 50% of its peaks within 10 base pairs of a motif. BCP and MUSIC perform best on histone data. These findings provide guidance and rationale for selecting the best peak caller for a given application. Oxford University Press 2017-05 2016-05-11 /pmc/articles/PMC5429005/ /pubmed/27169896 http://dx.doi.org/10.1093/bib/bbw035 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Papers Thomas, Reuben Thomas, Sean Holloway, Alisha K Pollard, Katherine S Features that define the best ChIP-seq peak calling algorithms
title	Features that define the best ChIP-seq peak calling algorithms
title_full	Features that define the best ChIP-seq peak calling algorithms
title_fullStr	Features that define the best ChIP-seq peak calling algorithms
title_full_unstemmed	Features that define the best ChIP-seq peak calling algorithms
title_short	Features that define the best ChIP-seq peak calling algorithms
title_sort	features that define the best chip-seq peak calling algorithms
topic	Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429005/ https://www.ncbi.nlm.nih.gov/pubmed/27169896 http://dx.doi.org/10.1093/bib/bbw035
work_keys_str_mv	AT thomasreuben featuresthatdefinethebestchipseqpeakcallingalgorithms AT thomassean featuresthatdefinethebestchipseqpeakcallingalgorithms AT hollowayalishak featuresthatdefinethebestchipseqpeakcallingalgorithms AT pollardkatherines featuresthatdefinethebestchipseqpeakcallingalgorithms

Features that define the best ChIP-seq peak calling algorithms

Ejemplares similares