Cargando…
Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
BACKGROUND: Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, reg...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4977478/ https://www.ncbi.nlm.nih.gov/pubmed/27490187 http://dx.doi.org/10.1186/s12918-016-0302-3 |
_version_ | 1782447034001784832 |
---|---|
author | Kim, Seong Gon Theera-Ampornpunt, Nawanol Fang, Chih-Hao Harwani, Mrudul Grama, Ananth Chaterji, Somali |
author_facet | Kim, Seong Gon Theera-Ampornpunt, Nawanol Fang, Chih-Hao Harwani, Mrudul Grama, Ananth Chaterji, Somali |
author_sort | Kim, Seong Gon |
collection | PubMed |
description | BACKGROUND: Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, regulatory activity, and genomic targets is crucial to the functional understanding of cellular events, ranging from cellular homeostasis to differentiation. Recent genome-wide investigation of epigenomic marks has indicated that enhancer elements could be enriched for certain epigenomic marks, such as, combinatorial patterns of histone modifications. METHODS: Our efforts in this paper are motivated by these recent advances in epigenomic profiling methods, which have uncovered enhancer-associated chromatin features in different cell types and organisms. Specifically, in this paper, we use recent state-of-the-art Deep Learning methods and develop a deep neural network (DNN)-based architecture, called EP-DNN, to predict the presence and types of enhancers in the human genome. It uses as features, the expression levels of the histone modifications at the peaks of the functional sites as well as in its adjacent regions. We apply EP-DNN to four different cell types: H1, IMR90, HepG2, and HeLa S3. We train EP-DNN using p300 binding sites as enhancers, and TSS and random non-DHS sites as non-enhancers. We perform EP-DNN predictions to quantify the validation rate for different levels of confidence in the predictions and also perform comparisons against two state-of-the-art computational models for enhancer predictions, DEEP-ENCODE and RFECS. RESULTS: We find that EP-DNN has superior accuracy and takes less time to make predictions. Next, we develop methods to make EP-DNN interpretable by computing the importance of each input feature in the classification task. This analysis indicates that the important histone modifications were distinct for different cell types, with some overlaps, e.g., H3K27ac was important in cell type H1 but less so in HeLa S3, while H3K4me1 was relatively important in all four cell types. We finally use the feature importance analysis to reduce the number of input features needed to train the DNN, thus reducing training time, which is often the computational bottleneck in the use of a DNN. CONCLUSIONS: In this paper, we developed EP-DNN, which has high accuracy of prediction, with validation rates above 90 % for the operational region of enhancer prediction for all four cell lines that we studied, outperforming DEEP-ENCODE and RFECS. Then, we developed a method to analyze a trained DNN and determine which histone modifications are important, and within that, which features proximal or distal to the enhancer site, are important. |
format | Online Article Text |
id | pubmed-4977478 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49774782016-08-17 Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions Kim, Seong Gon Theera-Ampornpunt, Nawanol Fang, Chih-Hao Harwani, Mrudul Grama, Ananth Chaterji, Somali BMC Syst Biol Research BACKGROUND: Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, regulatory activity, and genomic targets is crucial to the functional understanding of cellular events, ranging from cellular homeostasis to differentiation. Recent genome-wide investigation of epigenomic marks has indicated that enhancer elements could be enriched for certain epigenomic marks, such as, combinatorial patterns of histone modifications. METHODS: Our efforts in this paper are motivated by these recent advances in epigenomic profiling methods, which have uncovered enhancer-associated chromatin features in different cell types and organisms. Specifically, in this paper, we use recent state-of-the-art Deep Learning methods and develop a deep neural network (DNN)-based architecture, called EP-DNN, to predict the presence and types of enhancers in the human genome. It uses as features, the expression levels of the histone modifications at the peaks of the functional sites as well as in its adjacent regions. We apply EP-DNN to four different cell types: H1, IMR90, HepG2, and HeLa S3. We train EP-DNN using p300 binding sites as enhancers, and TSS and random non-DHS sites as non-enhancers. We perform EP-DNN predictions to quantify the validation rate for different levels of confidence in the predictions and also perform comparisons against two state-of-the-art computational models for enhancer predictions, DEEP-ENCODE and RFECS. RESULTS: We find that EP-DNN has superior accuracy and takes less time to make predictions. Next, we develop methods to make EP-DNN interpretable by computing the importance of each input feature in the classification task. This analysis indicates that the important histone modifications were distinct for different cell types, with some overlaps, e.g., H3K27ac was important in cell type H1 but less so in HeLa S3, while H3K4me1 was relatively important in all four cell types. We finally use the feature importance analysis to reduce the number of input features needed to train the DNN, thus reducing training time, which is often the computational bottleneck in the use of a DNN. CONCLUSIONS: In this paper, we developed EP-DNN, which has high accuracy of prediction, with validation rates above 90 % for the operational region of enhancer prediction for all four cell lines that we studied, outperforming DEEP-ENCODE and RFECS. Then, we developed a method to analyze a trained DNN and determine which histone modifications are important, and within that, which features proximal or distal to the enhancer site, are important. BioMed Central 2016-08-01 /pmc/articles/PMC4977478/ /pubmed/27490187 http://dx.doi.org/10.1186/s12918-016-0302-3 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Kim, Seong Gon Theera-Ampornpunt, Nawanol Fang, Chih-Hao Harwani, Mrudul Grama, Ananth Chaterji, Somali Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions |
title | Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions |
title_full | Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions |
title_fullStr | Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions |
title_full_unstemmed | Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions |
title_short | Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions |
title_sort | opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4977478/ https://www.ncbi.nlm.nih.gov/pubmed/27490187 http://dx.doi.org/10.1186/s12918-016-0302-3 |
work_keys_str_mv | AT kimseonggon openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions AT theeraampornpuntnawanol openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions AT fangchihhao openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions AT harwanimrudul openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions AT gramaananth openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions AT chaterjisomali openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions |