Cargando…

A systematic, large-scale comparison of transcription factor binding site models

BACKGROUND: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF...

Descripción completa

Detalles Bibliográficos
Autores principales: Hombach, Daniela, Schwarz, Jana Marie, Robinson, Peter N., Schuelke, Markus, Seelow, Dominik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4875604/
https://www.ncbi.nlm.nih.gov/pubmed/27209209
http://dx.doi.org/10.1186/s12864-016-2729-8
_version_ 1782433119739052032
author Hombach, Daniela
Schwarz, Jana Marie
Robinson, Peter N.
Schuelke, Markus
Seelow, Dominik
author_facet Hombach, Daniela
Schwarz, Jana Marie
Robinson, Peter N.
Schuelke, Markus
Seelow, Dominik
author_sort Hombach, Daniela
collection PubMed
description BACKGROUND: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified “real” in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. RESULTS: While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. CONCLUSIONS: Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM (http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2729-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4875604
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48756042016-05-22 A systematic, large-scale comparison of transcription factor binding site models Hombach, Daniela Schwarz, Jana Marie Robinson, Peter N. Schuelke, Markus Seelow, Dominik BMC Genomics Research Article BACKGROUND: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified “real” in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. RESULTS: While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. CONCLUSIONS: Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM (http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2729-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-05-21 /pmc/articles/PMC4875604/ /pubmed/27209209 http://dx.doi.org/10.1186/s12864-016-2729-8 Text en © Hombach et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Hombach, Daniela
Schwarz, Jana Marie
Robinson, Peter N.
Schuelke, Markus
Seelow, Dominik
A systematic, large-scale comparison of transcription factor binding site models
title A systematic, large-scale comparison of transcription factor binding site models
title_full A systematic, large-scale comparison of transcription factor binding site models
title_fullStr A systematic, large-scale comparison of transcription factor binding site models
title_full_unstemmed A systematic, large-scale comparison of transcription factor binding site models
title_short A systematic, large-scale comparison of transcription factor binding site models
title_sort systematic, large-scale comparison of transcription factor binding site models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4875604/
https://www.ncbi.nlm.nih.gov/pubmed/27209209
http://dx.doi.org/10.1186/s12864-016-2729-8
work_keys_str_mv AT hombachdaniela asystematiclargescalecomparisonoftranscriptionfactorbindingsitemodels
AT schwarzjanamarie asystematiclargescalecomparisonoftranscriptionfactorbindingsitemodels
AT robinsonpetern asystematiclargescalecomparisonoftranscriptionfactorbindingsitemodels
AT schuelkemarkus asystematiclargescalecomparisonoftranscriptionfactorbindingsitemodels
AT seelowdominik asystematiclargescalecomparisonoftranscriptionfactorbindingsitemodels
AT hombachdaniela systematiclargescalecomparisonoftranscriptionfactorbindingsitemodels
AT schwarzjanamarie systematiclargescalecomparisonoftranscriptionfactorbindingsitemodels
AT robinsonpetern systematiclargescalecomparisonoftranscriptionfactorbindingsitemodels
AT schuelkemarkus systematiclargescalecomparisonoftranscriptionfactorbindingsitemodels
AT seelowdominik systematiclargescalecomparisonoftranscriptionfactorbindingsitemodels