Cargando…
A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis
Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of bot...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3297574/ https://www.ncbi.nlm.nih.gov/pubmed/22412381 http://dx.doi.org/10.1371/journal.pgen.1002531 |
_version_ | 1782225891302047744 |
---|---|
author | Busser, Brian W. Taher, Leila Kim, Yongsok Tansey, Terese Bloom, Molly J. Ovcharenko, Ivan Michelson, Alan M. |
author_facet | Busser, Brian W. Taher, Leila Kim, Yongsok Tansey, Terese Bloom, Molly J. Ovcharenko, Ivan Michelson, Alan M. |
author_sort | Busser, Brian W. |
collection | PubMed |
description | Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA–based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type–specific developmental gene expression patterns. |
format | Online Article Text |
id | pubmed-3297574 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-32975742012-03-12 A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis Busser, Brian W. Taher, Leila Kim, Yongsok Tansey, Terese Bloom, Molly J. Ovcharenko, Ivan Michelson, Alan M. PLoS Genet Research Article Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA–based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type–specific developmental gene expression patterns. Public Library of Science 2012-03-08 /pmc/articles/PMC3297574/ /pubmed/22412381 http://dx.doi.org/10.1371/journal.pgen.1002531 Text en This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication. https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. |
spellingShingle | Research Article Busser, Brian W. Taher, Leila Kim, Yongsok Tansey, Terese Bloom, Molly J. Ovcharenko, Ivan Michelson, Alan M. A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis |
title | A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis |
title_full | A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis |
title_fullStr | A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis |
title_full_unstemmed | A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis |
title_short | A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis |
title_sort | machine learning approach for identifying novel cell type–specific transcriptional regulators of myogenesis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3297574/ https://www.ncbi.nlm.nih.gov/pubmed/22412381 http://dx.doi.org/10.1371/journal.pgen.1002531 |
work_keys_str_mv | AT busserbrianw amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT taherleila amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT kimyongsok amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT tanseyterese amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT bloommollyj amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT ovcharenkoivan amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT michelsonalanm amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT busserbrianw machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT taherleila machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT kimyongsok machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT tanseyterese machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT bloommollyj machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT ovcharenkoivan machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis AT michelsonalanm machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis |