Cargando…
F-Seq2: improving the feature density based peak caller with dynamic statistics
Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributi...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7902237/ https://www.ncbi.nlm.nih.gov/pubmed/33655209 http://dx.doi.org/10.1093/nargab/lqab012 |
_version_ | 1783654519946608640 |
---|---|
author | Zhao, Nanxiang Boyle, Alan P |
author_facet | Zhao, Nanxiang Boyle, Alan P |
author_sort | Zhao, Nanxiang |
collection | PubMed |
description | Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributions to a random expectation. Since its introduction, F-Seq has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive site (DNase-seq) data. However, the first release (F-Seq1) has two key limitations: lack of support for user-input control datasets, and poor test statistic reporting. These constrain its ability to capture systematic and experimental biases inherent to the background distributions in peak prediction, and to subsequently rank predicted peaks by confidence. To address these limitations, we present F-Seq2, which combines kernel density estimation and a dynamic ‘continuous’ Poisson test to account for local biases and accurately rank candidate peaks. The output of F-Seq2 is suitable for irreproducible discovery rate analysis as test statistics are calculated for individual candidate summits, allowing direct comparison of predictions across replicates. These improvements significantly boost the performance of F-Seq2 for ATAC-seq and ChIP-seq datasets, outperforming competing peak callers used by the ENCODE Consortium in terms of precision and recall. |
format | Online Article Text |
id | pubmed-7902237 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-79022372021-03-01 F-Seq2: improving the feature density based peak caller with dynamic statistics Zhao, Nanxiang Boyle, Alan P NAR Genom Bioinform Standard Article Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributions to a random expectation. Since its introduction, F-Seq has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive site (DNase-seq) data. However, the first release (F-Seq1) has two key limitations: lack of support for user-input control datasets, and poor test statistic reporting. These constrain its ability to capture systematic and experimental biases inherent to the background distributions in peak prediction, and to subsequently rank predicted peaks by confidence. To address these limitations, we present F-Seq2, which combines kernel density estimation and a dynamic ‘continuous’ Poisson test to account for local biases and accurately rank candidate peaks. The output of F-Seq2 is suitable for irreproducible discovery rate analysis as test statistics are calculated for individual candidate summits, allowing direct comparison of predictions across replicates. These improvements significantly boost the performance of F-Seq2 for ATAC-seq and ChIP-seq datasets, outperforming competing peak callers used by the ENCODE Consortium in terms of precision and recall. Oxford University Press 2021-02-23 /pmc/articles/PMC7902237/ /pubmed/33655209 http://dx.doi.org/10.1093/nargab/lqab012 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Standard Article Zhao, Nanxiang Boyle, Alan P F-Seq2: improving the feature density based peak caller with dynamic statistics |
title | F-Seq2: improving the feature density based peak caller with dynamic statistics |
title_full | F-Seq2: improving the feature density based peak caller with dynamic statistics |
title_fullStr | F-Seq2: improving the feature density based peak caller with dynamic statistics |
title_full_unstemmed | F-Seq2: improving the feature density based peak caller with dynamic statistics |
title_short | F-Seq2: improving the feature density based peak caller with dynamic statistics |
title_sort | f-seq2: improving the feature density based peak caller with dynamic statistics |
topic | Standard Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7902237/ https://www.ncbi.nlm.nih.gov/pubmed/33655209 http://dx.doi.org/10.1093/nargab/lqab012 |
work_keys_str_mv | AT zhaonanxiang fseq2improvingthefeaturedensitybasedpeakcallerwithdynamicstatistics AT boylealanp fseq2improvingthefeaturedensitybasedpeakcallerwithdynamicstatistics |