Cargando…

A Node Linkage Approach for Sequential Pattern Mining

Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large...

Descripción completa

Detalles Bibliográficos
Autores principales: Navarro, Osvaldo, Cumplido, René, Villaseñor-Pineda, Luis, Feregrino-Uribe, Claudia, Carrasco-Ochoa, Jesús Ariel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059635/
https://www.ncbi.nlm.nih.gov/pubmed/24933123
http://dx.doi.org/10.1371/journal.pone.0095418
_version_ 1782321259639472128
author Navarro, Osvaldo
Cumplido, René
Villaseñor-Pineda, Luis
Feregrino-Uribe, Claudia
Carrasco-Ochoa, Jesús Ariel
author_facet Navarro, Osvaldo
Cumplido, René
Villaseñor-Pineda, Luis
Feregrino-Uribe, Claudia
Carrasco-Ochoa, Jesús Ariel
author_sort Navarro, Osvaldo
collection PubMed
description Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms.
format Online
Article
Text
id pubmed-4059635
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40596352014-06-19 A Node Linkage Approach for Sequential Pattern Mining Navarro, Osvaldo Cumplido, René Villaseñor-Pineda, Luis Feregrino-Uribe, Claudia Carrasco-Ochoa, Jesús Ariel PLoS One Research Article Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms. Public Library of Science 2014-06-16 /pmc/articles/PMC4059635/ /pubmed/24933123 http://dx.doi.org/10.1371/journal.pone.0095418 Text en © 2014 Navarro et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Navarro, Osvaldo
Cumplido, René
Villaseñor-Pineda, Luis
Feregrino-Uribe, Claudia
Carrasco-Ochoa, Jesús Ariel
A Node Linkage Approach for Sequential Pattern Mining
title A Node Linkage Approach for Sequential Pattern Mining
title_full A Node Linkage Approach for Sequential Pattern Mining
title_fullStr A Node Linkage Approach for Sequential Pattern Mining
title_full_unstemmed A Node Linkage Approach for Sequential Pattern Mining
title_short A Node Linkage Approach for Sequential Pattern Mining
title_sort node linkage approach for sequential pattern mining
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059635/
https://www.ncbi.nlm.nih.gov/pubmed/24933123
http://dx.doi.org/10.1371/journal.pone.0095418
work_keys_str_mv AT navarroosvaldo anodelinkageapproachforsequentialpatternmining
AT cumplidorene anodelinkageapproachforsequentialpatternmining
AT villasenorpinedaluis anodelinkageapproachforsequentialpatternmining
AT feregrinouribeclaudia anodelinkageapproachforsequentialpatternmining
AT carrascoochoajesusariel anodelinkageapproachforsequentialpatternmining
AT navarroosvaldo nodelinkageapproachforsequentialpatternmining
AT cumplidorene nodelinkageapproachforsequentialpatternmining
AT villasenorpinedaluis nodelinkageapproachforsequentialpatternmining
AT feregrinouribeclaudia nodelinkageapproachforsequentialpatternmining
AT carrascoochoajesusariel nodelinkageapproachforsequentialpatternmining