Cargando…

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions

BACKGROUND: Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, reg...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Seong Gon, Theera-Ampornpunt, Nawanol, Fang, Chih-Hao, Harwani, Mrudul, Grama, Ananth, Chaterji, Somali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4977478/
https://www.ncbi.nlm.nih.gov/pubmed/27490187
http://dx.doi.org/10.1186/s12918-016-0302-3
_version_ 1782447034001784832
author Kim, Seong Gon
Theera-Ampornpunt, Nawanol
Fang, Chih-Hao
Harwani, Mrudul
Grama, Ananth
Chaterji, Somali
author_facet Kim, Seong Gon
Theera-Ampornpunt, Nawanol
Fang, Chih-Hao
Harwani, Mrudul
Grama, Ananth
Chaterji, Somali
author_sort Kim, Seong Gon
collection PubMed
description BACKGROUND: Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, regulatory activity, and genomic targets is crucial to the functional understanding of cellular events, ranging from cellular homeostasis to differentiation. Recent genome-wide investigation of epigenomic marks has indicated that enhancer elements could be enriched for certain epigenomic marks, such as, combinatorial patterns of histone modifications. METHODS: Our efforts in this paper are motivated by these recent advances in epigenomic profiling methods, which have uncovered enhancer-associated chromatin features in different cell types and organisms. Specifically, in this paper, we use recent state-of-the-art Deep Learning methods and develop a deep neural network (DNN)-based architecture, called EP-DNN, to predict the presence and types of enhancers in the human genome. It uses as features, the expression levels of the histone modifications at the peaks of the functional sites as well as in its adjacent regions. We apply EP-DNN to four different cell types: H1, IMR90, HepG2, and HeLa S3. We train EP-DNN using p300 binding sites as enhancers, and TSS and random non-DHS sites as non-enhancers. We perform EP-DNN predictions to quantify the validation rate for different levels of confidence in the predictions and also perform comparisons against two state-of-the-art computational models for enhancer predictions, DEEP-ENCODE and RFECS. RESULTS: We find that EP-DNN has superior accuracy and takes less time to make predictions. Next, we develop methods to make EP-DNN interpretable by computing the importance of each input feature in the classification task. This analysis indicates that the important histone modifications were distinct for different cell types, with some overlaps, e.g., H3K27ac was important in cell type H1 but less so in HeLa S3, while H3K4me1 was relatively important in all four cell types. We finally use the feature importance analysis to reduce the number of input features needed to train the DNN, thus reducing training time, which is often the computational bottleneck in the use of a DNN. CONCLUSIONS: In this paper, we developed EP-DNN, which has high accuracy of prediction, with validation rates above 90 % for the operational region of enhancer prediction for all four cell lines that we studied, outperforming DEEP-ENCODE and RFECS. Then, we developed a method to analyze a trained DNN and determine which histone modifications are important, and within that, which features proximal or distal to the enhancer site, are important.
format Online
Article
Text
id pubmed-4977478
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49774782016-08-17 Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions Kim, Seong Gon Theera-Ampornpunt, Nawanol Fang, Chih-Hao Harwani, Mrudul Grama, Ananth Chaterji, Somali BMC Syst Biol Research BACKGROUND: Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, regulatory activity, and genomic targets is crucial to the functional understanding of cellular events, ranging from cellular homeostasis to differentiation. Recent genome-wide investigation of epigenomic marks has indicated that enhancer elements could be enriched for certain epigenomic marks, such as, combinatorial patterns of histone modifications. METHODS: Our efforts in this paper are motivated by these recent advances in epigenomic profiling methods, which have uncovered enhancer-associated chromatin features in different cell types and organisms. Specifically, in this paper, we use recent state-of-the-art Deep Learning methods and develop a deep neural network (DNN)-based architecture, called EP-DNN, to predict the presence and types of enhancers in the human genome. It uses as features, the expression levels of the histone modifications at the peaks of the functional sites as well as in its adjacent regions. We apply EP-DNN to four different cell types: H1, IMR90, HepG2, and HeLa S3. We train EP-DNN using p300 binding sites as enhancers, and TSS and random non-DHS sites as non-enhancers. We perform EP-DNN predictions to quantify the validation rate for different levels of confidence in the predictions and also perform comparisons against two state-of-the-art computational models for enhancer predictions, DEEP-ENCODE and RFECS. RESULTS: We find that EP-DNN has superior accuracy and takes less time to make predictions. Next, we develop methods to make EP-DNN interpretable by computing the importance of each input feature in the classification task. This analysis indicates that the important histone modifications were distinct for different cell types, with some overlaps, e.g., H3K27ac was important in cell type H1 but less so in HeLa S3, while H3K4me1 was relatively important in all four cell types. We finally use the feature importance analysis to reduce the number of input features needed to train the DNN, thus reducing training time, which is often the computational bottleneck in the use of a DNN. CONCLUSIONS: In this paper, we developed EP-DNN, which has high accuracy of prediction, with validation rates above 90 % for the operational region of enhancer prediction for all four cell lines that we studied, outperforming DEEP-ENCODE and RFECS. Then, we developed a method to analyze a trained DNN and determine which histone modifications are important, and within that, which features proximal or distal to the enhancer site, are important. BioMed Central 2016-08-01 /pmc/articles/PMC4977478/ /pubmed/27490187 http://dx.doi.org/10.1186/s12918-016-0302-3 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Kim, Seong Gon
Theera-Ampornpunt, Nawanol
Fang, Chih-Hao
Harwani, Mrudul
Grama, Ananth
Chaterji, Somali
Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title_full Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title_fullStr Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title_full_unstemmed Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title_short Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title_sort opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4977478/
https://www.ncbi.nlm.nih.gov/pubmed/27490187
http://dx.doi.org/10.1186/s12918-016-0302-3
work_keys_str_mv AT kimseonggon openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions
AT theeraampornpuntnawanol openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions
AT fangchihhao openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions
AT harwanimrudul openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions
AT gramaananth openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions
AT chaterjisomali openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions