Cargando…

F-Seq2: improving the feature density based peak caller with dynamic statistics

Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Nanxiang, Boyle, Alan P
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Standard Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7902237/ https://www.ncbi.nlm.nih.gov/pubmed/33655209 http://dx.doi.org/10.1093/nargab/lqab012

_version_	1783654519946608640
author	Zhao, Nanxiang Boyle, Alan P
author_facet	Zhao, Nanxiang Boyle, Alan P
author_sort	Zhao, Nanxiang
collection	PubMed
description	Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributions to a random expectation. Since its introduction, F-Seq has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive site (DNase-seq) data. However, the first release (F-Seq1) has two key limitations: lack of support for user-input control datasets, and poor test statistic reporting. These constrain its ability to capture systematic and experimental biases inherent to the background distributions in peak prediction, and to subsequently rank predicted peaks by confidence. To address these limitations, we present F-Seq2, which combines kernel density estimation and a dynamic ‘continuous’ Poisson test to account for local biases and accurately rank candidate peaks. The output of F-Seq2 is suitable for irreproducible discovery rate analysis as test statistics are calculated for individual candidate summits, allowing direct comparison of predictions across replicates. These improvements significantly boost the performance of F-Seq2 for ATAC-seq and ChIP-seq datasets, outperforming competing peak callers used by the ENCODE Consortium in terms of precision and recall.
format	Online Article Text
id	pubmed-7902237
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-79022372021-03-01 F-Seq2: improving the feature density based peak caller with dynamic statistics Zhao, Nanxiang Boyle, Alan P NAR Genom Bioinform Standard Article Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributions to a random expectation. Since its introduction, F-Seq has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive site (DNase-seq) data. However, the first release (F-Seq1) has two key limitations: lack of support for user-input control datasets, and poor test statistic reporting. These constrain its ability to capture systematic and experimental biases inherent to the background distributions in peak prediction, and to subsequently rank predicted peaks by confidence. To address these limitations, we present F-Seq2, which combines kernel density estimation and a dynamic ‘continuous’ Poisson test to account for local biases and accurately rank candidate peaks. The output of F-Seq2 is suitable for irreproducible discovery rate analysis as test statistics are calculated for individual candidate summits, allowing direct comparison of predictions across replicates. These improvements significantly boost the performance of F-Seq2 for ATAC-seq and ChIP-seq datasets, outperforming competing peak callers used by the ENCODE Consortium in terms of precision and recall. Oxford University Press 2021-02-23 /pmc/articles/PMC7902237/ /pubmed/33655209 http://dx.doi.org/10.1093/nargab/lqab012 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Standard Article Zhao, Nanxiang Boyle, Alan P F-Seq2: improving the feature density based peak caller with dynamic statistics
title	F-Seq2: improving the feature density based peak caller with dynamic statistics
title_full	F-Seq2: improving the feature density based peak caller with dynamic statistics
title_fullStr	F-Seq2: improving the feature density based peak caller with dynamic statistics
title_full_unstemmed	F-Seq2: improving the feature density based peak caller with dynamic statistics
title_short	F-Seq2: improving the feature density based peak caller with dynamic statistics
title_sort	f-seq2: improving the feature density based peak caller with dynamic statistics
topic	Standard Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7902237/ https://www.ncbi.nlm.nih.gov/pubmed/33655209 http://dx.doi.org/10.1093/nargab/lqab012
work_keys_str_mv	AT zhaonanxiang fseq2improvingthefeaturedensitybasedpeakcallerwithdynamicstatistics AT boylealanp fseq2improvingthefeaturedensitybasedpeakcallerwithdynamicstatistics

F-Seq2: improving the feature density based peak caller with dynamic statistics

Ejemplares similares