Cargando…

A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis

Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of bot...

Descripción completa

Detalles Bibliográficos
Autores principales: Busser, Brian W., Taher, Leila, Kim, Yongsok, Tansey, Terese, Bloom, Molly J., Ovcharenko, Ivan, Michelson, Alan M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3297574/
https://www.ncbi.nlm.nih.gov/pubmed/22412381
http://dx.doi.org/10.1371/journal.pgen.1002531
_version_ 1782225891302047744
author Busser, Brian W.
Taher, Leila
Kim, Yongsok
Tansey, Terese
Bloom, Molly J.
Ovcharenko, Ivan
Michelson, Alan M.
author_facet Busser, Brian W.
Taher, Leila
Kim, Yongsok
Tansey, Terese
Bloom, Molly J.
Ovcharenko, Ivan
Michelson, Alan M.
author_sort Busser, Brian W.
collection PubMed
description Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA–based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type–specific developmental gene expression patterns.
format Online
Article
Text
id pubmed-3297574
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32975742012-03-12 A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis Busser, Brian W. Taher, Leila Kim, Yongsok Tansey, Terese Bloom, Molly J. Ovcharenko, Ivan Michelson, Alan M. PLoS Genet Research Article Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA–based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type–specific developmental gene expression patterns. Public Library of Science 2012-03-08 /pmc/articles/PMC3297574/ /pubmed/22412381 http://dx.doi.org/10.1371/journal.pgen.1002531 Text en This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication. https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Busser, Brian W.
Taher, Leila
Kim, Yongsok
Tansey, Terese
Bloom, Molly J.
Ovcharenko, Ivan
Michelson, Alan M.
A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis
title A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis
title_full A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis
title_fullStr A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis
title_full_unstemmed A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis
title_short A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis
title_sort machine learning approach for identifying novel cell type–specific transcriptional regulators of myogenesis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3297574/
https://www.ncbi.nlm.nih.gov/pubmed/22412381
http://dx.doi.org/10.1371/journal.pgen.1002531
work_keys_str_mv AT busserbrianw amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT taherleila amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT kimyongsok amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT tanseyterese amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT bloommollyj amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT ovcharenkoivan amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT michelsonalanm amachinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT busserbrianw machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT taherleila machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT kimyongsok machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT tanseyterese machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT bloommollyj machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT ovcharenkoivan machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis
AT michelsonalanm machinelearningapproachforidentifyingnovelcelltypespecifictranscriptionalregulatorsofmyogenesis