Cargando…
Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest
BACKGROUND: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a hi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5780765/ https://www.ncbi.nlm.nih.gov/pubmed/29363433 http://dx.doi.org/10.1186/s12864-017-4340-z |
_version_ | 1783294804189249536 |
---|---|
author | Wang, Xin Lin, Peijie Ho, Joshua W. K. |
author_facet | Wang, Xin Lin, Peijie Ho, Joshua W. K. |
author_sort | Wang, Xin |
collection | PubMed |
description | BACKGROUND: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs – a motif grammar – located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. RESULTS: We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. CONCLUSIONS: Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-017-4340-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5780765 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57807652018-02-06 Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest Wang, Xin Lin, Peijie Ho, Joshua W. K. BMC Genomics Research BACKGROUND: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs – a motif grammar – located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. RESULTS: We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. CONCLUSIONS: Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-017-4340-z) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-19 /pmc/articles/PMC5780765/ /pubmed/29363433 http://dx.doi.org/10.1186/s12864-017-4340-z Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Wang, Xin Lin, Peijie Ho, Joshua W. K. Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest |
title | Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest |
title_full | Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest |
title_fullStr | Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest |
title_full_unstemmed | Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest |
title_short | Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest |
title_sort | discovery of cell-type specific dna motif grammar in cis-regulatory elements using random forest |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5780765/ https://www.ncbi.nlm.nih.gov/pubmed/29363433 http://dx.doi.org/10.1186/s12864-017-4340-z |
work_keys_str_mv | AT wangxin discoveryofcelltypespecificdnamotifgrammarincisregulatoryelementsusingrandomforest AT linpeijie discoveryofcelltypespecificdnamotifgrammarincisregulatoryelementsusingrandomforest AT hojoshuawk discoveryofcelltypespecificdnamotifgrammarincisregulatoryelementsusingrandomforest |