Cargando…

Learning and interpreting the gene regulatory grammar in a deep learning framework

Deep neural networks (DNNs) have achieved state-of-the-art performance in identifying gene regulatory sequences, but they have provided limited insight into the biology of regulatory elements due to the difficulty of interpreting the complex features they learn. Several models of how combinatorial b...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Ling, Capra, John A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660921/ https://www.ncbi.nlm.nih.gov/pubmed/33137083 http://dx.doi.org/10.1371/journal.pcbi.1008334

_version_	1783609112979832832
author	Chen, Ling Capra, John A.
author_facet	Chen, Ling Capra, John A.
author_sort	Chen, Ling
collection	PubMed
description	Deep neural networks (DNNs) have achieved state-of-the-art performance in identifying gene regulatory sequences, but they have provided limited insight into the biology of regulatory elements due to the difficulty of interpreting the complex features they learn. Several models of how combinatorial binding of transcription factors, i.e. the regulatory grammar, drives enhancer activity have been proposed, ranging from the flexible TF billboard model to the stringent enhanceosome model. However, there is limited knowledge of the prevalence of these (or other) sequence architectures across enhancers. Here we perform several hypothesis-driven analyses to explore the ability of DNNs to learn the regulatory grammar of enhancers. We created synthetic datasets based on existing hypotheses about combinatorial transcription factor binding site (TFBS) patterns, including homotypic clusters, heterotypic clusters, and enhanceosomes, from real TF binding motifs from diverse TF families. We then trained deep residual neural networks (ResNets) to model the sequences under a range of scenarios that reflect real-world multi-label regulatory sequence prediction tasks. We developed a gradient-based unsupervised clustering method to extract the patterns learned by the ResNet models. We demonstrated that simulated regulatory grammars are best learned in the penultimate layer of the ResNets, and the proposed method can accurately retrieve the regulatory grammar even when there is heterogeneity in the enhancer categories and a large fraction of TFBS outside of the regulatory grammar. However, we also identify common scenarios where ResNets fail to learn simulated regulatory grammars. Finally, we applied the proposed method to mouse developmental enhancers and were able to identify the components of a known heterotypic TF cluster. Our results provide a framework for interpreting the regulatory rules learned by ResNets, and they demonstrate that the ability and efficiency of ResNets in learning the regulatory grammar depends on the nature of the prediction task.
format	Online Article Text
id	pubmed-7660921
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-76609212020-11-18 Learning and interpreting the gene regulatory grammar in a deep learning framework Chen, Ling Capra, John A. PLoS Comput Biol Research Article Deep neural networks (DNNs) have achieved state-of-the-art performance in identifying gene regulatory sequences, but they have provided limited insight into the biology of regulatory elements due to the difficulty of interpreting the complex features they learn. Several models of how combinatorial binding of transcription factors, i.e. the regulatory grammar, drives enhancer activity have been proposed, ranging from the flexible TF billboard model to the stringent enhanceosome model. However, there is limited knowledge of the prevalence of these (or other) sequence architectures across enhancers. Here we perform several hypothesis-driven analyses to explore the ability of DNNs to learn the regulatory grammar of enhancers. We created synthetic datasets based on existing hypotheses about combinatorial transcription factor binding site (TFBS) patterns, including homotypic clusters, heterotypic clusters, and enhanceosomes, from real TF binding motifs from diverse TF families. We then trained deep residual neural networks (ResNets) to model the sequences under a range of scenarios that reflect real-world multi-label regulatory sequence prediction tasks. We developed a gradient-based unsupervised clustering method to extract the patterns learned by the ResNet models. We demonstrated that simulated regulatory grammars are best learned in the penultimate layer of the ResNets, and the proposed method can accurately retrieve the regulatory grammar even when there is heterogeneity in the enhancer categories and a large fraction of TFBS outside of the regulatory grammar. However, we also identify common scenarios where ResNets fail to learn simulated regulatory grammars. Finally, we applied the proposed method to mouse developmental enhancers and were able to identify the components of a known heterotypic TF cluster. Our results provide a framework for interpreting the regulatory rules learned by ResNets, and they demonstrate that the ability and efficiency of ResNets in learning the regulatory grammar depends on the nature of the prediction task. Public Library of Science 2020-11-02 /pmc/articles/PMC7660921/ /pubmed/33137083 http://dx.doi.org/10.1371/journal.pcbi.1008334 Text en © 2020 Chen, Capra http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Chen, Ling Capra, John A. Learning and interpreting the gene regulatory grammar in a deep learning framework
title	Learning and interpreting the gene regulatory grammar in a deep learning framework
title_full	Learning and interpreting the gene regulatory grammar in a deep learning framework
title_fullStr	Learning and interpreting the gene regulatory grammar in a deep learning framework
title_full_unstemmed	Learning and interpreting the gene regulatory grammar in a deep learning framework
title_short	Learning and interpreting the gene regulatory grammar in a deep learning framework
title_sort	learning and interpreting the gene regulatory grammar in a deep learning framework
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660921/ https://www.ncbi.nlm.nih.gov/pubmed/33137083 http://dx.doi.org/10.1371/journal.pcbi.1008334
work_keys_str_mv	AT chenling learningandinterpretingthegeneregulatorygrammarinadeeplearningframework AT caprajohna learningandinterpretingthegeneregulatorygrammarinadeeplearningframework

Learning and interpreting the gene regulatory grammar in a deep learning framework

Ejemplares similares