Cargando…

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions

BACKGROUND: Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, reg...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Seong Gon, Theera-Ampornpunt, Nawanol, Fang, Chih-Hao, Harwani, Mrudul, Grama, Ananth, Chaterji, Somali
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4977478/ https://www.ncbi.nlm.nih.gov/pubmed/27490187 http://dx.doi.org/10.1186/s12918-016-0302-3

_version_	1782447034001784832
author	Kim, Seong Gon Theera-Ampornpunt, Nawanol Fang, Chih-Hao Harwani, Mrudul Grama, Ananth Chaterji, Somali
author_facet	Kim, Seong Gon Theera-Ampornpunt, Nawanol Fang, Chih-Hao Harwani, Mrudul Grama, Ananth Chaterji, Somali
author_sort	Kim, Seong Gon
collection	PubMed
description	BACKGROUND: Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, regulatory activity, and genomic targets is crucial to the functional understanding of cellular events, ranging from cellular homeostasis to differentiation. Recent genome-wide investigation of epigenomic marks has indicated that enhancer elements could be enriched for certain epigenomic marks, such as, combinatorial patterns of histone modifications. METHODS: Our efforts in this paper are motivated by these recent advances in epigenomic profiling methods, which have uncovered enhancer-associated chromatin features in different cell types and organisms. Specifically, in this paper, we use recent state-of-the-art Deep Learning methods and develop a deep neural network (DNN)-based architecture, called EP-DNN, to predict the presence and types of enhancers in the human genome. It uses as features, the expression levels of the histone modifications at the peaks of the functional sites as well as in its adjacent regions. We apply EP-DNN to four different cell types: H1, IMR90, HepG2, and HeLa S3. We train EP-DNN using p300 binding sites as enhancers, and TSS and random non-DHS sites as non-enhancers. We perform EP-DNN predictions to quantify the validation rate for different levels of confidence in the predictions and also perform comparisons against two state-of-the-art computational models for enhancer predictions, DEEP-ENCODE and RFECS. RESULTS: We find that EP-DNN has superior accuracy and takes less time to make predictions. Next, we develop methods to make EP-DNN interpretable by computing the importance of each input feature in the classification task. This analysis indicates that the important histone modifications were distinct for different cell types, with some overlaps, e.g., H3K27ac was important in cell type H1 but less so in HeLa S3, while H3K4me1 was relatively important in all four cell types. We finally use the feature importance analysis to reduce the number of input features needed to train the DNN, thus reducing training time, which is often the computational bottleneck in the use of a DNN. CONCLUSIONS: In this paper, we developed EP-DNN, which has high accuracy of prediction, with validation rates above 90 % for the operational region of enhancer prediction for all four cell lines that we studied, outperforming DEEP-ENCODE and RFECS. Then, we developed a method to analyze a trained DNN and determine which histone modifications are important, and within that, which features proximal or distal to the enhancer site, are important.
format	Online Article Text
id	pubmed-4977478
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-49774782016-08-17 Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions Kim, Seong Gon Theera-Ampornpunt, Nawanol Fang, Chih-Hao Harwani, Mrudul Grama, Ananth Chaterji, Somali BMC Syst Biol Research BACKGROUND: Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, regulatory activity, and genomic targets is crucial to the functional understanding of cellular events, ranging from cellular homeostasis to differentiation. Recent genome-wide investigation of epigenomic marks has indicated that enhancer elements could be enriched for certain epigenomic marks, such as, combinatorial patterns of histone modifications. METHODS: Our efforts in this paper are motivated by these recent advances in epigenomic profiling methods, which have uncovered enhancer-associated chromatin features in different cell types and organisms. Specifically, in this paper, we use recent state-of-the-art Deep Learning methods and develop a deep neural network (DNN)-based architecture, called EP-DNN, to predict the presence and types of enhancers in the human genome. It uses as features, the expression levels of the histone modifications at the peaks of the functional sites as well as in its adjacent regions. We apply EP-DNN to four different cell types: H1, IMR90, HepG2, and HeLa S3. We train EP-DNN using p300 binding sites as enhancers, and TSS and random non-DHS sites as non-enhancers. We perform EP-DNN predictions to quantify the validation rate for different levels of confidence in the predictions and also perform comparisons against two state-of-the-art computational models for enhancer predictions, DEEP-ENCODE and RFECS. RESULTS: We find that EP-DNN has superior accuracy and takes less time to make predictions. Next, we develop methods to make EP-DNN interpretable by computing the importance of each input feature in the classification task. This analysis indicates that the important histone modifications were distinct for different cell types, with some overlaps, e.g., H3K27ac was important in cell type H1 but less so in HeLa S3, while H3K4me1 was relatively important in all four cell types. We finally use the feature importance analysis to reduce the number of input features needed to train the DNN, thus reducing training time, which is often the computational bottleneck in the use of a DNN. CONCLUSIONS: In this paper, we developed EP-DNN, which has high accuracy of prediction, with validation rates above 90 % for the operational region of enhancer prediction for all four cell lines that we studied, outperforming DEEP-ENCODE and RFECS. Then, we developed a method to analyze a trained DNN and determine which histone modifications are important, and within that, which features proximal or distal to the enhancer site, are important. BioMed Central 2016-08-01 /pmc/articles/PMC4977478/ /pubmed/27490187 http://dx.doi.org/10.1186/s12918-016-0302-3 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Kim, Seong Gon Theera-Ampornpunt, Nawanol Fang, Chih-Hao Harwani, Mrudul Grama, Ananth Chaterji, Somali Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title	Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title_full	Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title_fullStr	Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title_full_unstemmed	Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title_short	Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
title_sort	opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4977478/ https://www.ncbi.nlm.nih.gov/pubmed/27490187 http://dx.doi.org/10.1186/s12918-016-0302-3
work_keys_str_mv	AT kimseonggon openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions AT theeraampornpuntnawanol openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions AT fangchihhao openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions AT harwanimrudul openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions AT gramaananth openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions AT chaterjisomali openinguptheblackboxaninterpretabledeepneuralnetworkbasedclassifierforcelltypespecificenhancerpredictions

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions

Ejemplares similares