Cargando…

F-Seq2: improving the feature density based peak caller with dynamic statistics

Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributi...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Nanxiang, Boyle, Alan P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7902237/
https://www.ncbi.nlm.nih.gov/pubmed/33655209
http://dx.doi.org/10.1093/nargab/lqab012
_version_ 1783654519946608640
author Zhao, Nanxiang
Boyle, Alan P
author_facet Zhao, Nanxiang
Boyle, Alan P
author_sort Zhao, Nanxiang
collection PubMed
description Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributions to a random expectation. Since its introduction, F-Seq has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive site (DNase-seq) data. However, the first release (F-Seq1) has two key limitations: lack of support for user-input control datasets, and poor test statistic reporting. These constrain its ability to capture systematic and experimental biases inherent to the background distributions in peak prediction, and to subsequently rank predicted peaks by confidence. To address these limitations, we present F-Seq2, which combines kernel density estimation and a dynamic ‘continuous’ Poisson test to account for local biases and accurately rank candidate peaks. The output of F-Seq2 is suitable for irreproducible discovery rate analysis as test statistics are calculated for individual candidate summits, allowing direct comparison of predictions across replicates. These improvements significantly boost the performance of F-Seq2 for ATAC-seq and ChIP-seq datasets, outperforming competing peak callers used by the ENCODE Consortium in terms of precision and recall.
format Online
Article
Text
id pubmed-7902237
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-79022372021-03-01 F-Seq2: improving the feature density based peak caller with dynamic statistics Zhao, Nanxiang Boyle, Alan P NAR Genom Bioinform Standard Article Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributions to a random expectation. Since its introduction, F-Seq has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive site (DNase-seq) data. However, the first release (F-Seq1) has two key limitations: lack of support for user-input control datasets, and poor test statistic reporting. These constrain its ability to capture systematic and experimental biases inherent to the background distributions in peak prediction, and to subsequently rank predicted peaks by confidence. To address these limitations, we present F-Seq2, which combines kernel density estimation and a dynamic ‘continuous’ Poisson test to account for local biases and accurately rank candidate peaks. The output of F-Seq2 is suitable for irreproducible discovery rate analysis as test statistics are calculated for individual candidate summits, allowing direct comparison of predictions across replicates. These improvements significantly boost the performance of F-Seq2 for ATAC-seq and ChIP-seq datasets, outperforming competing peak callers used by the ENCODE Consortium in terms of precision and recall. Oxford University Press 2021-02-23 /pmc/articles/PMC7902237/ /pubmed/33655209 http://dx.doi.org/10.1093/nargab/lqab012 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Zhao, Nanxiang
Boyle, Alan P
F-Seq2: improving the feature density based peak caller with dynamic statistics
title F-Seq2: improving the feature density based peak caller with dynamic statistics
title_full F-Seq2: improving the feature density based peak caller with dynamic statistics
title_fullStr F-Seq2: improving the feature density based peak caller with dynamic statistics
title_full_unstemmed F-Seq2: improving the feature density based peak caller with dynamic statistics
title_short F-Seq2: improving the feature density based peak caller with dynamic statistics
title_sort f-seq2: improving the feature density based peak caller with dynamic statistics
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7902237/
https://www.ncbi.nlm.nih.gov/pubmed/33655209
http://dx.doi.org/10.1093/nargab/lqab012
work_keys_str_mv AT zhaonanxiang fseq2improvingthefeaturedensitybasedpeakcallerwithdynamicstatistics
AT boylealanp fseq2improvingthefeaturedensitybasedpeakcallerwithdynamicstatistics