Cargando…
Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations
Background: The distribution and composition of cis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000 Research Limited
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6464064/ https://www.ncbi.nlm.nih.gov/pubmed/31001412 http://dx.doi.org/10.12688/f1000research.17363.2 |
_version_ | 1783410827830755328 |
---|---|
author | Lu, Ruipeng Rogan, Peter K. |
author_facet | Lu, Ruipeng Rogan, Peter K. |
author_sort | Lu, Ruipeng |
collection | PubMed |
description | Background: The distribution and composition of cis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets using Machine Learning (ML). Methods: Bray-Curtis Similarity was used to identify genes with correlated expression patterns across 53 tissues. TF targets from knockdown experiments were also analyzed by this approach to set up the ML framework. TFBSs were selected within DNase I-accessible intervals of corresponding promoter sequences using information theory-based position weight matrices (iPWMs) for each TF. Features from information-dense clusters of TFBSs were input to ML classifiers which predict these gene targets along with their accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed in silico to examine their impact on TFBS clustering and predict changes in gene regulation. Results: The glucocorticoid receptor gene ( NR3C1), whose regulation has been extensively studied, was selected to test this approach. SLC25A32 and TANK exhibited the most similar expression patterns to NR3C1. A Decision Tree classifier exhibited the best performance in detecting such genes, based on Area Under the Receiver Operating Characteristic curve (ROC). TF target gene prediction was confirmed using siRNA knockdown, which was more accurate than CRISPR/CAS9 inactivation. TFBS mutation analyses revealed that accurate target gene prediction required at least 1 information-dense TFBS cluster. Conclusions: ML based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes. |
format | Online Article Text |
id | pubmed-6464064 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | F1000 Research Limited |
record_format | MEDLINE/PubMed |
spelling | pubmed-64640642019-04-17 Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations Lu, Ruipeng Rogan, Peter K. F1000Res Research Article Background: The distribution and composition of cis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets using Machine Learning (ML). Methods: Bray-Curtis Similarity was used to identify genes with correlated expression patterns across 53 tissues. TF targets from knockdown experiments were also analyzed by this approach to set up the ML framework. TFBSs were selected within DNase I-accessible intervals of corresponding promoter sequences using information theory-based position weight matrices (iPWMs) for each TF. Features from information-dense clusters of TFBSs were input to ML classifiers which predict these gene targets along with their accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed in silico to examine their impact on TFBS clustering and predict changes in gene regulation. Results: The glucocorticoid receptor gene ( NR3C1), whose regulation has been extensively studied, was selected to test this approach. SLC25A32 and TANK exhibited the most similar expression patterns to NR3C1. A Decision Tree classifier exhibited the best performance in detecting such genes, based on Area Under the Receiver Operating Characteristic curve (ROC). TF target gene prediction was confirmed using siRNA knockdown, which was more accurate than CRISPR/CAS9 inactivation. TFBS mutation analyses revealed that accurate target gene prediction required at least 1 information-dense TFBS cluster. Conclusions: ML based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes. F1000 Research Limited 2019-04-08 /pmc/articles/PMC6464064/ /pubmed/31001412 http://dx.doi.org/10.12688/f1000research.17363.2 Text en Copyright: © 2019 Lu R and Rogan PK http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Lu, Ruipeng Rogan, Peter K. Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations |
title | Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations |
title_full | Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations |
title_fullStr | Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations |
title_full_unstemmed | Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations |
title_short | Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations |
title_sort | transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6464064/ https://www.ncbi.nlm.nih.gov/pubmed/31001412 http://dx.doi.org/10.12688/f1000research.17363.2 |
work_keys_str_mv | AT luruipeng transcriptionfactorbindingsiteclustersidentifytargetgeneswithsimilartissuewideexpressionandbufferagainstmutations AT roganpeterk transcriptionfactorbindingsiteclustersidentifytargetgeneswithsimilartissuewideexpressionandbufferagainstmutations |