Cargando…

NetNCSP: Nonoverlapping closed sequential pattern mining

Sequential pattern mining (SPM) has been applied in many fields. However, traditional SPM neglects the pattern repetition in sequence. To solve this problem, gap constraint SPM was proposed and can avoid finding too many useless patterns. Nonoverlapping SPM, as a branch of gap constraint SPM, means...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Youxi, Zhu, Changrui, Li, Yan, Guo, Lei, Wu, Xindong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier B.V. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7118609/
https://www.ncbi.nlm.nih.gov/pubmed/32292248
http://dx.doi.org/10.1016/j.knosys.2020.105812
_version_ 1783514596047323136
author Wu, Youxi
Zhu, Changrui
Li, Yan
Guo, Lei
Wu, Xindong
author_facet Wu, Youxi
Zhu, Changrui
Li, Yan
Guo, Lei
Wu, Xindong
author_sort Wu, Youxi
collection PubMed
description Sequential pattern mining (SPM) has been applied in many fields. However, traditional SPM neglects the pattern repetition in sequence. To solve this problem, gap constraint SPM was proposed and can avoid finding too many useless patterns. Nonoverlapping SPM, as a branch of gap constraint SPM, means that any two occurrences cannot use the same sequence letter in the same position as the occurrences. Nonoverlapping SPM can make a balance between efficiency and completeness. The frequent patterns discovered by existing methods normally contain redundant patterns. To reduce redundant patterns and improve the mining performance, this paper adopts the closed pattern mining strategy and proposes a complete algorithm, named Nettree for Nonoverlapping Closed Sequential Pattern (NetNCSP) based on the Nettree structure. NetNCSP is equipped with two key steps, support calculation and closeness determination. A backtracking strategy is employed to calculate the nonoverlapping support of a pattern on the corresponding Nettree, which reduces the time complexity. This paper also proposes three kinds of pruning strategies, inheriting, predicting, and determining. These pruning strategies are able to find the redundant patterns effectively since the strategies can predict the frequency and closeness of the patterns before the generation of the candidate patterns. Experimental results show that NetNCSP is not only more efficient but can also discover more closed patterns with good compressibility. Furtherly, in biological experiments NetNCSP mines the closed patterns in SARS-CoV-2 and SARS viruses. The results show that the two viruses are of similar pattern composition with different combinations.
format Online
Article
Text
id pubmed-7118609
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-71186092020-04-03 NetNCSP: Nonoverlapping closed sequential pattern mining Wu, Youxi Zhu, Changrui Li, Yan Guo, Lei Wu, Xindong Knowl Based Syst Article Sequential pattern mining (SPM) has been applied in many fields. However, traditional SPM neglects the pattern repetition in sequence. To solve this problem, gap constraint SPM was proposed and can avoid finding too many useless patterns. Nonoverlapping SPM, as a branch of gap constraint SPM, means that any two occurrences cannot use the same sequence letter in the same position as the occurrences. Nonoverlapping SPM can make a balance between efficiency and completeness. The frequent patterns discovered by existing methods normally contain redundant patterns. To reduce redundant patterns and improve the mining performance, this paper adopts the closed pattern mining strategy and proposes a complete algorithm, named Nettree for Nonoverlapping Closed Sequential Pattern (NetNCSP) based on the Nettree structure. NetNCSP is equipped with two key steps, support calculation and closeness determination. A backtracking strategy is employed to calculate the nonoverlapping support of a pattern on the corresponding Nettree, which reduces the time complexity. This paper also proposes three kinds of pruning strategies, inheriting, predicting, and determining. These pruning strategies are able to find the redundant patterns effectively since the strategies can predict the frequency and closeness of the patterns before the generation of the candidate patterns. Experimental results show that NetNCSP is not only more efficient but can also discover more closed patterns with good compressibility. Furtherly, in biological experiments NetNCSP mines the closed patterns in SARS-CoV-2 and SARS viruses. The results show that the two viruses are of similar pattern composition with different combinations. Elsevier B.V. 2020-05-21 2020-03-31 /pmc/articles/PMC7118609/ /pubmed/32292248 http://dx.doi.org/10.1016/j.knosys.2020.105812 Text en © 2020 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Wu, Youxi
Zhu, Changrui
Li, Yan
Guo, Lei
Wu, Xindong
NetNCSP: Nonoverlapping closed sequential pattern mining
title NetNCSP: Nonoverlapping closed sequential pattern mining
title_full NetNCSP: Nonoverlapping closed sequential pattern mining
title_fullStr NetNCSP: Nonoverlapping closed sequential pattern mining
title_full_unstemmed NetNCSP: Nonoverlapping closed sequential pattern mining
title_short NetNCSP: Nonoverlapping closed sequential pattern mining
title_sort netncsp: nonoverlapping closed sequential pattern mining
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7118609/
https://www.ncbi.nlm.nih.gov/pubmed/32292248
http://dx.doi.org/10.1016/j.knosys.2020.105812
work_keys_str_mv AT wuyouxi netncspnonoverlappingclosedsequentialpatternmining
AT zhuchangrui netncspnonoverlappingclosedsequentialpatternmining
AT liyan netncspnonoverlappingclosedsequentialpatternmining
AT guolei netncspnonoverlappingclosedsequentialpatternmining
AT wuxindong netncspnonoverlappingclosedsequentialpatternmining