Cargando…
Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling
BACKGROUND: DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBSs) in regulatory regions through footprinting. Recent studies have demons...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6385462/ https://www.ncbi.nlm.nih.gov/pubmed/30791920 http://dx.doi.org/10.1186/s13059-019-1654-y |
_version_ | 1783397209746702336 |
---|---|
author | Karabacak Calviello, Aslıhan Hirsekorn, Antje Wurmus, Ricardo Yusuf, Dilmurat Ohler, Uwe |
author_facet | Karabacak Calviello, Aslıhan Hirsekorn, Antje Wurmus, Ricardo Yusuf, Dilmurat Ohler, Uwe |
author_sort | Karabacak Calviello, Aslıhan |
collection | PubMed |
description | BACKGROUND: DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBSs) in regulatory regions through footprinting. Recent studies have demonstrated the sequence bias of DNase I and its adverse effects on footprinting efficiency. However, footprinting and the impact of sequence bias have not been extensively studied for ATAC-seq. RESULTS: Here, we undertake a systematic comparison of the two methods and show that a modification to the ATAC-seq protocol increases its yield and its agreement with DNase-seq data from the same cell line. We demonstrate that the two methods have distinct sequence biases and correct for these protocol-specific biases when performing footprinting. Despite the differences in footprint shapes, the locations of the inferred footprints in ATAC-seq and DNase-seq are largely concordant. However, the protocol-specific sequence biases in conjunction with the sequence content of TFBSs impact the discrimination of footprint from the background, which leads to one method outperforming the other for some TFs. Finally, we address the depth required for reproducible identification of open chromatin regions and TF footprints. CONCLUSIONS: We demonstrate that the impact of bias correction on footprinting performance is greater for DNase-seq than for ATAC-seq and that DNase-seq footprinting leads to better performance. It is possible to infer concordant footprints by using replicates, highlighting the importance of reproducibility assessment. The results presented here provide an overview of the advantages and limitations of footprinting analyses using ATAC-seq and DNase-seq. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1654-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6385462 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63854622019-03-04 Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling Karabacak Calviello, Aslıhan Hirsekorn, Antje Wurmus, Ricardo Yusuf, Dilmurat Ohler, Uwe Genome Biol Research BACKGROUND: DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBSs) in regulatory regions through footprinting. Recent studies have demonstrated the sequence bias of DNase I and its adverse effects on footprinting efficiency. However, footprinting and the impact of sequence bias have not been extensively studied for ATAC-seq. RESULTS: Here, we undertake a systematic comparison of the two methods and show that a modification to the ATAC-seq protocol increases its yield and its agreement with DNase-seq data from the same cell line. We demonstrate that the two methods have distinct sequence biases and correct for these protocol-specific biases when performing footprinting. Despite the differences in footprint shapes, the locations of the inferred footprints in ATAC-seq and DNase-seq are largely concordant. However, the protocol-specific sequence biases in conjunction with the sequence content of TFBSs impact the discrimination of footprint from the background, which leads to one method outperforming the other for some TFs. Finally, we address the depth required for reproducible identification of open chromatin regions and TF footprints. CONCLUSIONS: We demonstrate that the impact of bias correction on footprinting performance is greater for DNase-seq than for ATAC-seq and that DNase-seq footprinting leads to better performance. It is possible to infer concordant footprints by using replicates, highlighting the importance of reproducibility assessment. The results presented here provide an overview of the advantages and limitations of footprinting analyses using ATAC-seq and DNase-seq. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1654-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-21 /pmc/articles/PMC6385462/ /pubmed/30791920 http://dx.doi.org/10.1186/s13059-019-1654-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Karabacak Calviello, Aslıhan Hirsekorn, Antje Wurmus, Ricardo Yusuf, Dilmurat Ohler, Uwe Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling |
title | Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling |
title_full | Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling |
title_fullStr | Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling |
title_full_unstemmed | Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling |
title_short | Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling |
title_sort | reproducible inference of transcription factor footprints in atac-seq and dnase-seq datasets using protocol-specific bias modeling |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6385462/ https://www.ncbi.nlm.nih.gov/pubmed/30791920 http://dx.doi.org/10.1186/s13059-019-1654-y |
work_keys_str_mv | AT karabacakcalvielloaslıhan reproducibleinferenceoftranscriptionfactorfootprintsinatacseqanddnaseseqdatasetsusingprotocolspecificbiasmodeling AT hirsekornantje reproducibleinferenceoftranscriptionfactorfootprintsinatacseqanddnaseseqdatasetsusingprotocolspecificbiasmodeling AT wurmusricardo reproducibleinferenceoftranscriptionfactorfootprintsinatacseqanddnaseseqdatasetsusingprotocolspecificbiasmodeling AT yusufdilmurat reproducibleinferenceoftranscriptionfactorfootprintsinatacseqanddnaseseqdatasetsusingprotocolspecificbiasmodeling AT ohleruwe reproducibleinferenceoftranscriptionfactorfootprintsinatacseqanddnaseseqdatasetsusingprotocolspecificbiasmodeling |