Cargando…

Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling

BACKGROUND: DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBSs) in regulatory regions through footprinting. Recent studies have demons...

Descripción completa

Detalles Bibliográficos
Autores principales: Karabacak Calviello, Aslıhan, Hirsekorn, Antje, Wurmus, Ricardo, Yusuf, Dilmurat, Ohler, Uwe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6385462/
https://www.ncbi.nlm.nih.gov/pubmed/30791920
http://dx.doi.org/10.1186/s13059-019-1654-y
_version_ 1783397209746702336
author Karabacak Calviello, Aslıhan
Hirsekorn, Antje
Wurmus, Ricardo
Yusuf, Dilmurat
Ohler, Uwe
author_facet Karabacak Calviello, Aslıhan
Hirsekorn, Antje
Wurmus, Ricardo
Yusuf, Dilmurat
Ohler, Uwe
author_sort Karabacak Calviello, Aslıhan
collection PubMed
description BACKGROUND: DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBSs) in regulatory regions through footprinting. Recent studies have demonstrated the sequence bias of DNase I and its adverse effects on footprinting efficiency. However, footprinting and the impact of sequence bias have not been extensively studied for ATAC-seq. RESULTS: Here, we undertake a systematic comparison of the two methods and show that a modification to the ATAC-seq protocol increases its yield and its agreement with DNase-seq data from the same cell line. We demonstrate that the two methods have distinct sequence biases and correct for these protocol-specific biases when performing footprinting. Despite the differences in footprint shapes, the locations of the inferred footprints in ATAC-seq and DNase-seq are largely concordant. However, the protocol-specific sequence biases in conjunction with the sequence content of TFBSs impact the discrimination of footprint from the background, which leads to one method outperforming the other for some TFs. Finally, we address the depth required for reproducible identification of open chromatin regions and TF footprints. CONCLUSIONS: We demonstrate that the impact of bias correction on footprinting performance is greater for DNase-seq than for ATAC-seq and that DNase-seq footprinting leads to better performance. It is possible to infer concordant footprints by using replicates, highlighting the importance of reproducibility assessment. The results presented here provide an overview of the advantages and limitations of footprinting analyses using ATAC-seq and DNase-seq. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1654-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6385462
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63854622019-03-04 Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling Karabacak Calviello, Aslıhan Hirsekorn, Antje Wurmus, Ricardo Yusuf, Dilmurat Ohler, Uwe Genome Biol Research BACKGROUND: DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBSs) in regulatory regions through footprinting. Recent studies have demonstrated the sequence bias of DNase I and its adverse effects on footprinting efficiency. However, footprinting and the impact of sequence bias have not been extensively studied for ATAC-seq. RESULTS: Here, we undertake a systematic comparison of the two methods and show that a modification to the ATAC-seq protocol increases its yield and its agreement with DNase-seq data from the same cell line. We demonstrate that the two methods have distinct sequence biases and correct for these protocol-specific biases when performing footprinting. Despite the differences in footprint shapes, the locations of the inferred footprints in ATAC-seq and DNase-seq are largely concordant. However, the protocol-specific sequence biases in conjunction with the sequence content of TFBSs impact the discrimination of footprint from the background, which leads to one method outperforming the other for some TFs. Finally, we address the depth required for reproducible identification of open chromatin regions and TF footprints. CONCLUSIONS: We demonstrate that the impact of bias correction on footprinting performance is greater for DNase-seq than for ATAC-seq and that DNase-seq footprinting leads to better performance. It is possible to infer concordant footprints by using replicates, highlighting the importance of reproducibility assessment. The results presented here provide an overview of the advantages and limitations of footprinting analyses using ATAC-seq and DNase-seq. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1654-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-21 /pmc/articles/PMC6385462/ /pubmed/30791920 http://dx.doi.org/10.1186/s13059-019-1654-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Karabacak Calviello, Aslıhan
Hirsekorn, Antje
Wurmus, Ricardo
Yusuf, Dilmurat
Ohler, Uwe
Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling
title Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling
title_full Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling
title_fullStr Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling
title_full_unstemmed Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling
title_short Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling
title_sort reproducible inference of transcription factor footprints in atac-seq and dnase-seq datasets using protocol-specific bias modeling
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6385462/
https://www.ncbi.nlm.nih.gov/pubmed/30791920
http://dx.doi.org/10.1186/s13059-019-1654-y
work_keys_str_mv AT karabacakcalvielloaslıhan reproducibleinferenceoftranscriptionfactorfootprintsinatacseqanddnaseseqdatasetsusingprotocolspecificbiasmodeling
AT hirsekornantje reproducibleinferenceoftranscriptionfactorfootprintsinatacseqanddnaseseqdatasetsusingprotocolspecificbiasmodeling
AT wurmusricardo reproducibleinferenceoftranscriptionfactorfootprintsinatacseqanddnaseseqdatasetsusingprotocolspecificbiasmodeling
AT yusufdilmurat reproducibleinferenceoftranscriptionfactorfootprintsinatacseqanddnaseseqdatasetsusingprotocolspecificbiasmodeling
AT ohleruwe reproducibleinferenceoftranscriptionfactorfootprintsinatacseqanddnaseseqdatasetsusingprotocolspecificbiasmodeling