Cargando…

Benchmarking Attention-Based Interpretability of Deep Learning in Multivariate Time Series Predictions

The adaptation of deep learning models within safety-critical systems cannot rely only on good prediction performance but needs to provide interpretable and robust explanations for their decisions. When modeling complex sequences, attention mechanisms are regarded as the established approach to supp...

Descripción completa

Detalles Bibliográficos
Autores principales: Barić, Domjan, Fumić, Petar, Horvatić, Davor, Lipic, Tomislav
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7912396/
https://www.ncbi.nlm.nih.gov/pubmed/33503822
http://dx.doi.org/10.3390/e23020143
_version_ 1783656567930880000
author Barić, Domjan
Fumić, Petar
Horvatić, Davor
Lipic, Tomislav
author_facet Barić, Domjan
Fumić, Petar
Horvatić, Davor
Lipic, Tomislav
author_sort Barić, Domjan
collection PubMed
description The adaptation of deep learning models within safety-critical systems cannot rely only on good prediction performance but needs to provide interpretable and robust explanations for their decisions. When modeling complex sequences, attention mechanisms are regarded as the established approach to support deep neural networks with intrinsic interpretability. This paper focuses on the emerging trend of specifically designing diagnostic datasets for understanding the inner workings of attention mechanism based deep learning models for multivariate forecasting tasks. We design a novel benchmark of synthetically designed datasets with the transparent underlying generating process of multiple time series interactions with increasing complexity. The benchmark enables empirical evaluation of the performance of attention based deep neural networks in three different aspects: (i) prediction performance score, (ii) interpretability correctness, (iii) sensitivity analysis. Our analysis shows that although most models have satisfying and stable prediction performance results, they often fail to give correct interpretability. The only model with both a satisfying performance score and correct interpretability is IMV-LSTM, capturing both autocorrelations and crosscorrelations between multiple time series. Interestingly, while evaluating IMV-LSTM on simulated data from statistical and mechanistic models, the correctness of interpretability increases with more complex datasets.
format Online
Article
Text
id pubmed-7912396
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79123962021-02-28 Benchmarking Attention-Based Interpretability of Deep Learning in Multivariate Time Series Predictions Barić, Domjan Fumić, Petar Horvatić, Davor Lipic, Tomislav Entropy (Basel) Article The adaptation of deep learning models within safety-critical systems cannot rely only on good prediction performance but needs to provide interpretable and robust explanations for their decisions. When modeling complex sequences, attention mechanisms are regarded as the established approach to support deep neural networks with intrinsic interpretability. This paper focuses on the emerging trend of specifically designing diagnostic datasets for understanding the inner workings of attention mechanism based deep learning models for multivariate forecasting tasks. We design a novel benchmark of synthetically designed datasets with the transparent underlying generating process of multiple time series interactions with increasing complexity. The benchmark enables empirical evaluation of the performance of attention based deep neural networks in three different aspects: (i) prediction performance score, (ii) interpretability correctness, (iii) sensitivity analysis. Our analysis shows that although most models have satisfying and stable prediction performance results, they often fail to give correct interpretability. The only model with both a satisfying performance score and correct interpretability is IMV-LSTM, capturing both autocorrelations and crosscorrelations between multiple time series. Interestingly, while evaluating IMV-LSTM on simulated data from statistical and mechanistic models, the correctness of interpretability increases with more complex datasets. MDPI 2021-01-25 /pmc/articles/PMC7912396/ /pubmed/33503822 http://dx.doi.org/10.3390/e23020143 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Barić, Domjan
Fumić, Petar
Horvatić, Davor
Lipic, Tomislav
Benchmarking Attention-Based Interpretability of Deep Learning in Multivariate Time Series Predictions
title Benchmarking Attention-Based Interpretability of Deep Learning in Multivariate Time Series Predictions
title_full Benchmarking Attention-Based Interpretability of Deep Learning in Multivariate Time Series Predictions
title_fullStr Benchmarking Attention-Based Interpretability of Deep Learning in Multivariate Time Series Predictions
title_full_unstemmed Benchmarking Attention-Based Interpretability of Deep Learning in Multivariate Time Series Predictions
title_short Benchmarking Attention-Based Interpretability of Deep Learning in Multivariate Time Series Predictions
title_sort benchmarking attention-based interpretability of deep learning in multivariate time series predictions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7912396/
https://www.ncbi.nlm.nih.gov/pubmed/33503822
http://dx.doi.org/10.3390/e23020143
work_keys_str_mv AT baricdomjan benchmarkingattentionbasedinterpretabilityofdeeplearninginmultivariatetimeseriespredictions
AT fumicpetar benchmarkingattentionbasedinterpretabilityofdeeplearninginmultivariatetimeseriespredictions
AT horvaticdavor benchmarkingattentionbasedinterpretabilityofdeeplearninginmultivariatetimeseriespredictions
AT lipictomislav benchmarkingattentionbasedinterpretabilityofdeeplearninginmultivariatetimeseriespredictions