Cargando…

A Comparison of Peak Callers Used for DNase-Seq Data

Genome-wide profiling of open chromatin regions using DNase I and high-throughput sequencing (DNase-seq) is an increasingly popular approach for finding and studying regulatory elements. A variety of algorithms have been developed to identify regions of open chromatin from raw sequence-tag data, whi...

Descripción completa

Detalles Bibliográficos
Autores principales: Koohy, Hashem, Down, Thomas A., Spivakov, Mikhail, Hubbard, Tim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4014496/
https://www.ncbi.nlm.nih.gov/pubmed/24810143
http://dx.doi.org/10.1371/journal.pone.0096303
_version_ 1782315185107632128
author Koohy, Hashem
Down, Thomas A.
Spivakov, Mikhail
Hubbard, Tim
author_facet Koohy, Hashem
Down, Thomas A.
Spivakov, Mikhail
Hubbard, Tim
author_sort Koohy, Hashem
collection PubMed
description Genome-wide profiling of open chromatin regions using DNase I and high-throughput sequencing (DNase-seq) is an increasingly popular approach for finding and studying regulatory elements. A variety of algorithms have been developed to identify regions of open chromatin from raw sequence-tag data, which has motivated us to assess and compare their performance. In this study, four published, publicly available peak calling algorithms used for DNase-seq data analysis (F-seq, Hotspot, MACS and ZINBA) are assessed at a range of signal thresholds on two published DNase-seq datasets for three cell types. The results were benchmarked against an independent dataset of regulatory regions derived from ENCODE in vivo transcription factor binding data for each particular cell type. The level of overlap between peak regions reported by each algorithm and this ENCODE-derived reference set was used to assess sensitivity and specificity of the algorithms. Our study suggests that F-seq has a slightly higher sensitivity than the next best algorithms. Hotspot and the ChIP-seq oriented method, MACS, both perform competitively when used with their default parameters. However the generic peak finder ZINBA appears to be less sensitive than the other three. We also assess accuracy of each algorithm over a range of signal thresholds. In particular, we show that the accuracy of F-Seq can be considerably improved by using a threshold setting that is different from the default value.
format Online
Article
Text
id pubmed-4014496
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40144962014-05-14 A Comparison of Peak Callers Used for DNase-Seq Data Koohy, Hashem Down, Thomas A. Spivakov, Mikhail Hubbard, Tim PLoS One Research Article Genome-wide profiling of open chromatin regions using DNase I and high-throughput sequencing (DNase-seq) is an increasingly popular approach for finding and studying regulatory elements. A variety of algorithms have been developed to identify regions of open chromatin from raw sequence-tag data, which has motivated us to assess and compare their performance. In this study, four published, publicly available peak calling algorithms used for DNase-seq data analysis (F-seq, Hotspot, MACS and ZINBA) are assessed at a range of signal thresholds on two published DNase-seq datasets for three cell types. The results were benchmarked against an independent dataset of regulatory regions derived from ENCODE in vivo transcription factor binding data for each particular cell type. The level of overlap between peak regions reported by each algorithm and this ENCODE-derived reference set was used to assess sensitivity and specificity of the algorithms. Our study suggests that F-seq has a slightly higher sensitivity than the next best algorithms. Hotspot and the ChIP-seq oriented method, MACS, both perform competitively when used with their default parameters. However the generic peak finder ZINBA appears to be less sensitive than the other three. We also assess accuracy of each algorithm over a range of signal thresholds. In particular, we show that the accuracy of F-Seq can be considerably improved by using a threshold setting that is different from the default value. Public Library of Science 2014-05-08 /pmc/articles/PMC4014496/ /pubmed/24810143 http://dx.doi.org/10.1371/journal.pone.0096303 Text en © 2014 Koohy et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Koohy, Hashem
Down, Thomas A.
Spivakov, Mikhail
Hubbard, Tim
A Comparison of Peak Callers Used for DNase-Seq Data
title A Comparison of Peak Callers Used for DNase-Seq Data
title_full A Comparison of Peak Callers Used for DNase-Seq Data
title_fullStr A Comparison of Peak Callers Used for DNase-Seq Data
title_full_unstemmed A Comparison of Peak Callers Used for DNase-Seq Data
title_short A Comparison of Peak Callers Used for DNase-Seq Data
title_sort comparison of peak callers used for dnase-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4014496/
https://www.ncbi.nlm.nih.gov/pubmed/24810143
http://dx.doi.org/10.1371/journal.pone.0096303
work_keys_str_mv AT koohyhashem acomparisonofpeakcallersusedfordnaseseqdata
AT downthomasa acomparisonofpeakcallersusedfordnaseseqdata
AT spivakovmikhail acomparisonofpeakcallersusedfordnaseseqdata
AT hubbardtim acomparisonofpeakcallersusedfordnaseseqdata
AT koohyhashem comparisonofpeakcallersusedfordnaseseqdata
AT downthomasa comparisonofpeakcallersusedfordnaseseqdata
AT spivakovmikhail comparisonofpeakcallersusedfordnaseseqdata
AT hubbardtim comparisonofpeakcallersusedfordnaseseqdata