Cargando…

Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks

Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. To gain insights into why a DNN makes a given prediction, model interpretability method...

Descripción completa

Detalles Bibliográficos
Autores principales:	Koo, Peter K., Majdandzic, Antonio, Ploenzke, Matthew, Anand, Praveen, Paul, Steffan B.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8118286/ https://www.ncbi.nlm.nih.gov/pubmed/33983921 http://dx.doi.org/10.1371/journal.pcbi.1008925

_version_	1783691717762875392
author	Koo, Peter K. Majdandzic, Antonio Ploenzke, Matthew Anand, Praveen Paul, Steffan B.
author_facet	Koo, Peter K. Majdandzic, Antonio Ploenzke, Matthew Anand, Praveen Paul, Steffan B.
author_sort	Koo, Peter K.
collection	PubMed
description	Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. To gain insights into why a DNN makes a given prediction, model interpretability methods, such as attribution methods, can be employed to identify motif-like representations along a given sequence. Because explanations are given on an individual sequence basis and can vary substantially across sequences, deducing generalizable trends across the dataset and quantifying their effect size remains a challenge. Here we introduce global importance analysis (GIA), a model interpretability method that quantifies the population-level effect size that putative patterns have on model predictions. GIA provides an avenue to quantitatively test hypotheses of putative patterns and their interactions with other patterns, as well as map out specific functions the network has learned. As a case study, we demonstrate the utility of GIA on the computational task of predicting RNA-protein interactions from sequence. We first introduce a convolutional network, we call ResidualBind, and benchmark its performance against previous methods on RNAcompete data. Using GIA, we then demonstrate that in addition to sequence motifs, ResidualBind learns a model that considers the number of motifs, their spacing, and sequence context, such as RNA secondary structure and GC-bias.
format	Online Article Text
id	pubmed-8118286
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-81182862021-05-24 Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks Koo, Peter K. Majdandzic, Antonio Ploenzke, Matthew Anand, Praveen Paul, Steffan B. PLoS Comput Biol Research Article Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. To gain insights into why a DNN makes a given prediction, model interpretability methods, such as attribution methods, can be employed to identify motif-like representations along a given sequence. Because explanations are given on an individual sequence basis and can vary substantially across sequences, deducing generalizable trends across the dataset and quantifying their effect size remains a challenge. Here we introduce global importance analysis (GIA), a model interpretability method that quantifies the population-level effect size that putative patterns have on model predictions. GIA provides an avenue to quantitatively test hypotheses of putative patterns and their interactions with other patterns, as well as map out specific functions the network has learned. As a case study, we demonstrate the utility of GIA on the computational task of predicting RNA-protein interactions from sequence. We first introduce a convolutional network, we call ResidualBind, and benchmark its performance against previous methods on RNAcompete data. Using GIA, we then demonstrate that in addition to sequence motifs, ResidualBind learns a model that considers the number of motifs, their spacing, and sequence context, such as RNA secondary structure and GC-bias. Public Library of Science 2021-05-13 /pmc/articles/PMC8118286/ /pubmed/33983921 http://dx.doi.org/10.1371/journal.pcbi.1008925 Text en © 2021 Koo et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Koo, Peter K. Majdandzic, Antonio Ploenzke, Matthew Anand, Praveen Paul, Steffan B. Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks
title	Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks
title_full	Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks
title_fullStr	Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks
title_full_unstemmed	Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks
title_short	Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks
title_sort	global importance analysis: an interpretability method to quantify importance of genomic features in deep neural networks
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8118286/ https://www.ncbi.nlm.nih.gov/pubmed/33983921 http://dx.doi.org/10.1371/journal.pcbi.1008925
work_keys_str_mv	AT koopeterk globalimportanceanalysisaninterpretabilitymethodtoquantifyimportanceofgenomicfeaturesindeepneuralnetworks AT majdandzicantonio globalimportanceanalysisaninterpretabilitymethodtoquantifyimportanceofgenomicfeaturesindeepneuralnetworks AT ploenzkematthew globalimportanceanalysisaninterpretabilitymethodtoquantifyimportanceofgenomicfeaturesindeepneuralnetworks AT anandpraveen globalimportanceanalysisaninterpretabilitymethodtoquantifyimportanceofgenomicfeaturesindeepneuralnetworks AT paulsteffanb globalimportanceanalysisaninterpretabilitymethodtoquantifyimportanceofgenomicfeaturesindeepneuralnetworks

Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks

Ejemplares similares