Cargando…

Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment

BACKGROUND: Chromatin immunoprecipitation (ChIP) coupled to high-throughput sequencing (ChIP-Seq) techniques can reveal DNA regions bound by transcription factors (TF). Analysis of the ChIP-Seq regions is now a central component in gene regulation studies. The need remains strong for methods to impr...

Descripción completa

Detalles Bibliográficos
Autores principales: Worsley Hunt, Rebecca, Mathelier, Anthony, del Peso, Luis, Wasserman, Wyeth W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4082612/
https://www.ncbi.nlm.nih.gov/pubmed/24927817
http://dx.doi.org/10.1186/1471-2164-15-472
_version_ 1782324272154279936
author Worsley Hunt, Rebecca
Mathelier, Anthony
del Peso, Luis
Wasserman, Wyeth W
author_facet Worsley Hunt, Rebecca
Mathelier, Anthony
del Peso, Luis
Wasserman, Wyeth W
author_sort Worsley Hunt, Rebecca
collection PubMed
description BACKGROUND: Chromatin immunoprecipitation (ChIP) coupled to high-throughput sequencing (ChIP-Seq) techniques can reveal DNA regions bound by transcription factors (TF). Analysis of the ChIP-Seq regions is now a central component in gene regulation studies. The need remains strong for methods to improve the interpretation of ChIP-Seq data and the study of specific TF binding sites (TFBS). RESULTS: We introduce a set of methods to improve the interpretation of ChIP-Seq data, including the inference of mediating TFs based on TFBS motif over-representation analysis and the subsequent study of spatial distribution of TFBSs. TFBS over-representation analysis applied to ChIP-Seq data is used to detect which TFBSs arise more frequently than expected by chance. Visualization of over-representation analysis results with new composition-bias plots reveals systematic bias in over-representation scores. We introduce the BiasAway background generating software to resolve the problem. A heuristic procedure based on topological motif enrichment relative to the ChIP-Seq peaks’ local maximums highlights peaks likely to be directly bound by a TF of interest. The results suggest that on average two-thirds of a ChIP-Seq dataset’s peaks are bound by the ChIP’d TF; the origin of the remaining peaks remaining undetermined. Additional visualization methods allow for the study of both inter-TFBS spatial relationships and motif-flanking sequence properties, as demonstrated in case studies for TBP and ZNF143/THAP11. CONCLUSIONS: Topological properties of TFBS within ChIP-Seq datasets can be harnessed to better interpret regulatory sequences. Using GC content corrected TFBS over-representation analysis, combined with visualization techniques and analysis of the topological distribution of TFBS, we can distinguish peaks likely to be directly bound by a TF. The new methods will empower researchers for exploration of gene regulation and TF binding. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-472) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4082612
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40826122014-07-18 Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment Worsley Hunt, Rebecca Mathelier, Anthony del Peso, Luis Wasserman, Wyeth W BMC Genomics Research Article BACKGROUND: Chromatin immunoprecipitation (ChIP) coupled to high-throughput sequencing (ChIP-Seq) techniques can reveal DNA regions bound by transcription factors (TF). Analysis of the ChIP-Seq regions is now a central component in gene regulation studies. The need remains strong for methods to improve the interpretation of ChIP-Seq data and the study of specific TF binding sites (TFBS). RESULTS: We introduce a set of methods to improve the interpretation of ChIP-Seq data, including the inference of mediating TFs based on TFBS motif over-representation analysis and the subsequent study of spatial distribution of TFBSs. TFBS over-representation analysis applied to ChIP-Seq data is used to detect which TFBSs arise more frequently than expected by chance. Visualization of over-representation analysis results with new composition-bias plots reveals systematic bias in over-representation scores. We introduce the BiasAway background generating software to resolve the problem. A heuristic procedure based on topological motif enrichment relative to the ChIP-Seq peaks’ local maximums highlights peaks likely to be directly bound by a TF of interest. The results suggest that on average two-thirds of a ChIP-Seq dataset’s peaks are bound by the ChIP’d TF; the origin of the remaining peaks remaining undetermined. Additional visualization methods allow for the study of both inter-TFBS spatial relationships and motif-flanking sequence properties, as demonstrated in case studies for TBP and ZNF143/THAP11. CONCLUSIONS: Topological properties of TFBS within ChIP-Seq datasets can be harnessed to better interpret regulatory sequences. Using GC content corrected TFBS over-representation analysis, combined with visualization techniques and analysis of the topological distribution of TFBS, we can distinguish peaks likely to be directly bound by a TF. The new methods will empower researchers for exploration of gene regulation and TF binding. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-472) contains supplementary material, which is available to authorized users. BioMed Central 2014-06-13 /pmc/articles/PMC4082612/ /pubmed/24927817 http://dx.doi.org/10.1186/1471-2164-15-472 Text en © Worsley Hunt et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Worsley Hunt, Rebecca
Mathelier, Anthony
del Peso, Luis
Wasserman, Wyeth W
Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
title Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
title_full Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
title_fullStr Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
title_full_unstemmed Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
title_short Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
title_sort improving analysis of transcription factor binding sites within chip-seq data based on topological motif enrichment
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4082612/
https://www.ncbi.nlm.nih.gov/pubmed/24927817
http://dx.doi.org/10.1186/1471-2164-15-472
work_keys_str_mv AT worsleyhuntrebecca improvinganalysisoftranscriptionfactorbindingsiteswithinchipseqdatabasedontopologicalmotifenrichment
AT mathelieranthony improvinganalysisoftranscriptionfactorbindingsiteswithinchipseqdatabasedontopologicalmotifenrichment
AT delpesoluis improvinganalysisoftranscriptionfactorbindingsiteswithinchipseqdatabasedontopologicalmotifenrichment
AT wassermanwyethw improvinganalysisoftranscriptionfactorbindingsiteswithinchipseqdatabasedontopologicalmotifenrichment