Cargando…

Discovery and validation of information theory-based transcription factor and cofactor binding site motifs

Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. Thi...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Ruipeng, Mucaki, Eliseos J., Rogan, Peter K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389469/
https://www.ncbi.nlm.nih.gov/pubmed/27899659
http://dx.doi.org/10.1093/nar/gkw1036
_version_ 1782521276381790208
author Lu, Ruipeng
Mucaki, Eliseos J.
Rogan, Peter K.
author_facet Lu, Ruipeng
Mucaki, Eliseos J.
Rogan, Peter K.
author_sort Lu, Ruipeng
collection PubMed
description Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes.
format Online
Article
Text
id pubmed-5389469
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-53894692017-04-24 Discovery and validation of information theory-based transcription factor and cofactor binding site motifs Lu, Ruipeng Mucaki, Eliseos J. Rogan, Peter K. Nucleic Acids Res Methods Online Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes. Oxford University Press 2017-03-17 2016-11-28 /pmc/articles/PMC5389469/ /pubmed/27899659 http://dx.doi.org/10.1093/nar/gkw1036 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Lu, Ruipeng
Mucaki, Eliseos J.
Rogan, Peter K.
Discovery and validation of information theory-based transcription factor and cofactor binding site motifs
title Discovery and validation of information theory-based transcription factor and cofactor binding site motifs
title_full Discovery and validation of information theory-based transcription factor and cofactor binding site motifs
title_fullStr Discovery and validation of information theory-based transcription factor and cofactor binding site motifs
title_full_unstemmed Discovery and validation of information theory-based transcription factor and cofactor binding site motifs
title_short Discovery and validation of information theory-based transcription factor and cofactor binding site motifs
title_sort discovery and validation of information theory-based transcription factor and cofactor binding site motifs
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389469/
https://www.ncbi.nlm.nih.gov/pubmed/27899659
http://dx.doi.org/10.1093/nar/gkw1036
work_keys_str_mv AT luruipeng discoveryandvalidationofinformationtheorybasedtranscriptionfactorandcofactorbindingsitemotifs
AT mucakieliseosj discoveryandvalidationofinformationtheorybasedtranscriptionfactorandcofactorbindingsitemotifs
AT roganpeterk discoveryandvalidationofinformationtheorybasedtranscriptionfactorandcofactorbindingsitemotifs