Cargando…
A systematic, large-scale comparison of transcription factor binding site models
BACKGROUND: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4875604/ https://www.ncbi.nlm.nih.gov/pubmed/27209209 http://dx.doi.org/10.1186/s12864-016-2729-8 |
_version_ | 1782433119739052032 |
---|---|
author | Hombach, Daniela Schwarz, Jana Marie Robinson, Peter N. Schuelke, Markus Seelow, Dominik |
author_facet | Hombach, Daniela Schwarz, Jana Marie Robinson, Peter N. Schuelke, Markus Seelow, Dominik |
author_sort | Hombach, Daniela |
collection | PubMed |
description | BACKGROUND: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified “real” in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. RESULTS: While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. CONCLUSIONS: Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM (http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2729-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4875604 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-48756042016-05-22 A systematic, large-scale comparison of transcription factor binding site models Hombach, Daniela Schwarz, Jana Marie Robinson, Peter N. Schuelke, Markus Seelow, Dominik BMC Genomics Research Article BACKGROUND: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified “real” in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. RESULTS: While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. CONCLUSIONS: Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM (http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2729-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-05-21 /pmc/articles/PMC4875604/ /pubmed/27209209 http://dx.doi.org/10.1186/s12864-016-2729-8 Text en © Hombach et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Hombach, Daniela Schwarz, Jana Marie Robinson, Peter N. Schuelke, Markus Seelow, Dominik A systematic, large-scale comparison of transcription factor binding site models |
title | A systematic, large-scale comparison of transcription factor binding site models |
title_full | A systematic, large-scale comparison of transcription factor binding site models |
title_fullStr | A systematic, large-scale comparison of transcription factor binding site models |
title_full_unstemmed | A systematic, large-scale comparison of transcription factor binding site models |
title_short | A systematic, large-scale comparison of transcription factor binding site models |
title_sort | systematic, large-scale comparison of transcription factor binding site models |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4875604/ https://www.ncbi.nlm.nih.gov/pubmed/27209209 http://dx.doi.org/10.1186/s12864-016-2729-8 |
work_keys_str_mv | AT hombachdaniela asystematiclargescalecomparisonoftranscriptionfactorbindingsitemodels AT schwarzjanamarie asystematiclargescalecomparisonoftranscriptionfactorbindingsitemodels AT robinsonpetern asystematiclargescalecomparisonoftranscriptionfactorbindingsitemodels AT schuelkemarkus asystematiclargescalecomparisonoftranscriptionfactorbindingsitemodels AT seelowdominik asystematiclargescalecomparisonoftranscriptionfactorbindingsitemodels AT hombachdaniela systematiclargescalecomparisonoftranscriptionfactorbindingsitemodels AT schwarzjanamarie systematiclargescalecomparisonoftranscriptionfactorbindingsitemodels AT robinsonpetern systematiclargescalecomparisonoftranscriptionfactorbindingsitemodels AT schuelkemarkus systematiclargescalecomparisonoftranscriptionfactorbindingsitemodels AT seelowdominik systematiclargescalecomparisonoftranscriptionfactorbindingsitemodels |