Cargando…

Time Series Mining at Petascale Performance

The mining of time series data plays an important role in modern information retrieval and analysis systems. In particular, the identification of similarities within and across time series has garnered significant attention and effort over the last few years. For this task, the class of matrix profi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Raoofy, Amir, Karlstetter, Roman, Yang, Dai, Trinitis, Carsten, Schulz, Martin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295345/ http://dx.doi.org/10.1007/978-3-030-50743-5_6

_version_	1783546632875278336
author	Raoofy, Amir Karlstetter, Roman Yang, Dai Trinitis, Carsten Schulz, Martin
author_facet	Raoofy, Amir Karlstetter, Roman Yang, Dai Trinitis, Carsten Schulz, Martin
author_sort	Raoofy, Amir
collection	PubMed
description	The mining of time series data plays an important role in modern information retrieval and analysis systems. In particular, the identification of similarities within and across time series has garnered significant attention and effort over the last few years. For this task, the class of matrix profile algorithms, which create a generic structure that encodes correlations among records and dimensions—the matrix profile—is a promising approach, as it allows simplified post-processing and analysis steps by examining the resulting matrix profile structure. However, it is expensive to create a matrix profile: it requires significant computational power to evaluate the distance among all subsequence pairs in a time series, especially for very long and multi-dimensional time series with a large dimensionality. Existing approaches are limited in their scalability, as they do not target High Performance Computing systems, and—for most realistic problems—are suited only for datasets with a small dimensionality. In this paper, we introduce a novel MPI-based approach for the calculation of a matrix profile for multi-dimensional time series that pushes these limits. We evaluate the efficiency of our approach using an analytical performance model combined with experimental data. Finally, we demonstrate our solution on a 128-dimensional time series dataset of 1 million records, solving 274 trillion sorts at a sustained 1.3 Petaflop/s performance on the SuperMUC-NG system.
format	Online Article Text
id	pubmed-7295345
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-72953452020-06-16 Time Series Mining at Petascale Performance Raoofy, Amir Karlstetter, Roman Yang, Dai Trinitis, Carsten Schulz, Martin High Performance Computing Article The mining of time series data plays an important role in modern information retrieval and analysis systems. In particular, the identification of similarities within and across time series has garnered significant attention and effort over the last few years. For this task, the class of matrix profile algorithms, which create a generic structure that encodes correlations among records and dimensions—the matrix profile—is a promising approach, as it allows simplified post-processing and analysis steps by examining the resulting matrix profile structure. However, it is expensive to create a matrix profile: it requires significant computational power to evaluate the distance among all subsequence pairs in a time series, especially for very long and multi-dimensional time series with a large dimensionality. Existing approaches are limited in their scalability, as they do not target High Performance Computing systems, and—for most realistic problems—are suited only for datasets with a small dimensionality. In this paper, we introduce a novel MPI-based approach for the calculation of a matrix profile for multi-dimensional time series that pushes these limits. We evaluate the efficiency of our approach using an analytical performance model combined with experimental data. Finally, we demonstrate our solution on a 128-dimensional time series dataset of 1 million records, solving 274 trillion sorts at a sustained 1.3 Petaflop/s performance on the SuperMUC-NG system. 2020-05-22 /pmc/articles/PMC7295345/ http://dx.doi.org/10.1007/978-3-030-50743-5_6 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Raoofy, Amir Karlstetter, Roman Yang, Dai Trinitis, Carsten Schulz, Martin Time Series Mining at Petascale Performance
title	Time Series Mining at Petascale Performance
title_full	Time Series Mining at Petascale Performance
title_fullStr	Time Series Mining at Petascale Performance
title_full_unstemmed	Time Series Mining at Petascale Performance
title_short	Time Series Mining at Petascale Performance
title_sort	time series mining at petascale performance
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295345/ http://dx.doi.org/10.1007/978-3-030-50743-5_6
work_keys_str_mv	AT raoofyamir timeseriesminingatpetascaleperformance AT karlstetterroman timeseriesminingatpetascaleperformance AT yangdai timeseriesminingatpetascaleperformance AT trinitiscarsten timeseriesminingatpetascaleperformance AT schulzmartin timeseriesminingatpetascaleperformance

Time Series Mining at Petascale Performance

Ejemplares similares