Cargando…

Exploring efficient grouping algorithms in regular expression matching

BACKGROUND: Regular expression matching (REM) is widely employed as the major tool for deep packet inspection (DPI) applications. For automatic processing, the regular expression patterns need to be converted to a deterministic finite automata (DFA). However, with the ever-increasing scale and compl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xu, Chengcheng, Su, Jinshu, Chen, Shuhui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6200238/ https://www.ncbi.nlm.nih.gov/pubmed/30356262 http://dx.doi.org/10.1371/journal.pone.0206068

_version_	1783365297086922752
author	Xu, Chengcheng Su, Jinshu Chen, Shuhui
author_facet	Xu, Chengcheng Su, Jinshu Chen, Shuhui
author_sort	Xu, Chengcheng
collection	PubMed
description	BACKGROUND: Regular expression matching (REM) is widely employed as the major tool for deep packet inspection (DPI) applications. For automatic processing, the regular expression patterns need to be converted to a deterministic finite automata (DFA). However, with the ever-increasing scale and complexity of pattern sets, state explosion problem has brought a great challenge to the DFA based regular expression matching. Rule grouping is a direct method to solve the state explosion problem. The original rule set is divided into multiple disjoint groups, and each group is compiled to a separate DFA, thus to significantly restrain the severe state explosion problem when compiling all the rules to a single DFA. OBJECTIVE: For practical implementation, the total number of DFA states should be as few as possible, thus the data structures of these DFAs can be deployed on fast on-chip memories for rapid access. In addition, to support fast pattern update in some applications, the time cost for grouping should be as small as possible. In this study, we aimed to propose an efficient grouping method, which generates as few states as possible with as little time overhead as possible. METHODS: When compiling multiple patterns into a single DFA, the number of DFA states is usually greater than the total number of states when compiling each pattern to a separate DFA. This is mainly caused by the semantic overlaps among different rules. By quantifying the interaction values for each pair of rules, the rule grouping problem can be reduced to the maximum k-cut graph partitioning problem. Then, we propose a heuristic algorithm called the one-step greedy (OSG) algorithm to solve this NP-hard problem. What’s more, a subroutine named the heuristic initialization (HI) algorithm is devised to further optimize the grouping algorithms. RESULTS: We employed three practical rule sets for the experimental evaluation. Results show that the OSG algorithm outperforms the state-of-the-art grouping solutions regarding both the total number of DFA states and time cost for grouping. The HI subroutine also demonstrates its significant optimization effect on the grouping algorithms. CONCLUSIONS: The DFA state explosion problem has became the most challenging issue in the regular expression matching applications. Rule grouping is a practical direction by dividing the original rule sets into multiple disjoint groups. In this paper, we investigate the current grouping solutions, and propose a compact and efficient grouping algorithm. Experiments conducted on practical rule sets demonstrate the superiority of our proposal.
format	Online Article Text
id	pubmed-6200238
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-62002382018-11-19 Exploring efficient grouping algorithms in regular expression matching Xu, Chengcheng Su, Jinshu Chen, Shuhui PLoS One Research Article BACKGROUND: Regular expression matching (REM) is widely employed as the major tool for deep packet inspection (DPI) applications. For automatic processing, the regular expression patterns need to be converted to a deterministic finite automata (DFA). However, with the ever-increasing scale and complexity of pattern sets, state explosion problem has brought a great challenge to the DFA based regular expression matching. Rule grouping is a direct method to solve the state explosion problem. The original rule set is divided into multiple disjoint groups, and each group is compiled to a separate DFA, thus to significantly restrain the severe state explosion problem when compiling all the rules to a single DFA. OBJECTIVE: For practical implementation, the total number of DFA states should be as few as possible, thus the data structures of these DFAs can be deployed on fast on-chip memories for rapid access. In addition, to support fast pattern update in some applications, the time cost for grouping should be as small as possible. In this study, we aimed to propose an efficient grouping method, which generates as few states as possible with as little time overhead as possible. METHODS: When compiling multiple patterns into a single DFA, the number of DFA states is usually greater than the total number of states when compiling each pattern to a separate DFA. This is mainly caused by the semantic overlaps among different rules. By quantifying the interaction values for each pair of rules, the rule grouping problem can be reduced to the maximum k-cut graph partitioning problem. Then, we propose a heuristic algorithm called the one-step greedy (OSG) algorithm to solve this NP-hard problem. What’s more, a subroutine named the heuristic initialization (HI) algorithm is devised to further optimize the grouping algorithms. RESULTS: We employed three practical rule sets for the experimental evaluation. Results show that the OSG algorithm outperforms the state-of-the-art grouping solutions regarding both the total number of DFA states and time cost for grouping. The HI subroutine also demonstrates its significant optimization effect on the grouping algorithms. CONCLUSIONS: The DFA state explosion problem has became the most challenging issue in the regular expression matching applications. Rule grouping is a practical direction by dividing the original rule sets into multiple disjoint groups. In this paper, we investigate the current grouping solutions, and propose a compact and efficient grouping algorithm. Experiments conducted on practical rule sets demonstrate the superiority of our proposal. Public Library of Science 2018-10-24 /pmc/articles/PMC6200238/ /pubmed/30356262 http://dx.doi.org/10.1371/journal.pone.0206068 Text en © 2018 Xu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Xu, Chengcheng Su, Jinshu Chen, Shuhui Exploring efficient grouping algorithms in regular expression matching
title	Exploring efficient grouping algorithms in regular expression matching
title_full	Exploring efficient grouping algorithms in regular expression matching
title_fullStr	Exploring efficient grouping algorithms in regular expression matching
title_full_unstemmed	Exploring efficient grouping algorithms in regular expression matching
title_short	Exploring efficient grouping algorithms in regular expression matching
title_sort	exploring efficient grouping algorithms in regular expression matching
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6200238/ https://www.ncbi.nlm.nih.gov/pubmed/30356262 http://dx.doi.org/10.1371/journal.pone.0206068
work_keys_str_mv	AT xuchengcheng exploringefficientgroupingalgorithmsinregularexpressionmatching AT sujinshu exploringefficientgroupingalgorithmsinregularexpressionmatching AT chenshuhui exploringefficientgroupingalgorithmsinregularexpressionmatching

Exploring efficient grouping algorithms in regular expression matching

Ejemplares similares