Cargando…

Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time...

Descripción completa

Detalles Bibliográficos
Autores principales: Rakthanmanon, Thanawin, Campana, Bilson, Mueen, Abdullah, Batista, Gustavo, Westover, Brandon, Zhu, Qiang, Zakaria, Jesin, Keogh, Eamonn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6816304/
https://www.ncbi.nlm.nih.gov/pubmed/31660254
http://dx.doi.org/10.1145/2339530.2339576
_version_ 1783463349524103168
author Rakthanmanon, Thanawin
Campana, Bilson
Mueen, Abdullah
Batista, Gustavo
Westover, Brandon
Zhu, Qiang
Zakaria, Jesin
Keogh, Eamonn
author_facet Rakthanmanon, Thanawin
Campana, Bilson
Mueen, Abdullah
Batista, Gustavo
Westover, Brandon
Zhu, Qiang
Zakaria, Jesin
Keogh, Eamonn
author_sort Rakthanmanon, Thanawin
collection PubMed
description Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.
format Online
Article
Text
id pubmed-6816304
institution National Center for Biotechnology Information
language English
publishDate 2012
record_format MEDLINE/PubMed
spelling pubmed-68163042019-10-28 Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping Rakthanmanon, Thanawin Campana, Bilson Mueen, Abdullah Batista, Gustavo Westover, Brandon Zhu, Qiang Zakaria, Jesin Keogh, Eamonn KDD Article Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible. 2012-08 /pmc/articles/PMC6816304/ /pubmed/31660254 http://dx.doi.org/10.1145/2339530.2339576 Text en http://creativecommons.org/licenses/by/4.0/ Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
spellingShingle Article
Rakthanmanon, Thanawin
Campana, Bilson
Mueen, Abdullah
Batista, Gustavo
Westover, Brandon
Zhu, Qiang
Zakaria, Jesin
Keogh, Eamonn
Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping
title Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping
title_full Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping
title_fullStr Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping
title_full_unstemmed Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping
title_short Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping
title_sort searching and mining trillions of time series subsequences under dynamic time warping
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6816304/
https://www.ncbi.nlm.nih.gov/pubmed/31660254
http://dx.doi.org/10.1145/2339530.2339576
work_keys_str_mv AT rakthanmanonthanawin searchingandminingtrillionsoftimeseriessubsequencesunderdynamictimewarping
AT campanabilson searchingandminingtrillionsoftimeseriessubsequencesunderdynamictimewarping
AT mueenabdullah searchingandminingtrillionsoftimeseriessubsequencesunderdynamictimewarping
AT batistagustavo searchingandminingtrillionsoftimeseriessubsequencesunderdynamictimewarping
AT westoverbrandon searchingandminingtrillionsoftimeseriessubsequencesunderdynamictimewarping
AT zhuqiang searchingandminingtrillionsoftimeseriessubsequencesunderdynamictimewarping
AT zakariajesin searchingandminingtrillionsoftimeseriessubsequencesunderdynamictimewarping
AT keogheamonn searchingandminingtrillionsoftimeseriessubsequencesunderdynamictimewarping