Cargando…

A new split based searching for exact pattern matching for natural texts

Exact pattern matching algorithms are popular and used widely in several applications, such as molecular biology, text processing, image processing, web search engines, network intrusion detection systems and operating systems. The focus of these algorithms is to achieve time efficiency according to...

Descripción completa

Detalles Bibliográficos
Autores principales: Hakak, Saqib, Kamsin, Amirrudin, Shivakumara, Palaiahnakote, Idna Idris, Mohd Yamani, Gilkar, Gulshan Amin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6062031/
https://www.ncbi.nlm.nih.gov/pubmed/30048486
http://dx.doi.org/10.1371/journal.pone.0200912
_version_ 1783342323995770880
author Hakak, Saqib
Kamsin, Amirrudin
Shivakumara, Palaiahnakote
Idna Idris, Mohd Yamani
Gilkar, Gulshan Amin
author_facet Hakak, Saqib
Kamsin, Amirrudin
Shivakumara, Palaiahnakote
Idna Idris, Mohd Yamani
Gilkar, Gulshan Amin
author_sort Hakak, Saqib
collection PubMed
description Exact pattern matching algorithms are popular and used widely in several applications, such as molecular biology, text processing, image processing, web search engines, network intrusion detection systems and operating systems. The focus of these algorithms is to achieve time efficiency according to applications but not memory consumption. In this work, we propose a novel idea to achieve both time efficiency and memory consumption by splitting query string for searching in Corpus. For a given text, the proposed algorithm split the query pattern into two equal halves and considers the second (right) half as a query string for searching in Corpus. Once the match is found with second halves, the proposed algorithm applies brute force procedure to find remaining match by referring the location of right half. Experimental results on different S1 Dataset, namely Arabic, English, Chinese, Italian and French text databases show that the proposed algorithm outperforms the existing S1 Algorithm in terms of time efficiency and memory consumption as the length of the query pattern increases.
format Online
Article
Text
id pubmed-6062031
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-60620312018-08-03 A new split based searching for exact pattern matching for natural texts Hakak, Saqib Kamsin, Amirrudin Shivakumara, Palaiahnakote Idna Idris, Mohd Yamani Gilkar, Gulshan Amin PLoS One Research Article Exact pattern matching algorithms are popular and used widely in several applications, such as molecular biology, text processing, image processing, web search engines, network intrusion detection systems and operating systems. The focus of these algorithms is to achieve time efficiency according to applications but not memory consumption. In this work, we propose a novel idea to achieve both time efficiency and memory consumption by splitting query string for searching in Corpus. For a given text, the proposed algorithm split the query pattern into two equal halves and considers the second (right) half as a query string for searching in Corpus. Once the match is found with second halves, the proposed algorithm applies brute force procedure to find remaining match by referring the location of right half. Experimental results on different S1 Dataset, namely Arabic, English, Chinese, Italian and French text databases show that the proposed algorithm outperforms the existing S1 Algorithm in terms of time efficiency and memory consumption as the length of the query pattern increases. Public Library of Science 2018-07-26 /pmc/articles/PMC6062031/ /pubmed/30048486 http://dx.doi.org/10.1371/journal.pone.0200912 Text en © 2018 Hakak et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Hakak, Saqib
Kamsin, Amirrudin
Shivakumara, Palaiahnakote
Idna Idris, Mohd Yamani
Gilkar, Gulshan Amin
A new split based searching for exact pattern matching for natural texts
title A new split based searching for exact pattern matching for natural texts
title_full A new split based searching for exact pattern matching for natural texts
title_fullStr A new split based searching for exact pattern matching for natural texts
title_full_unstemmed A new split based searching for exact pattern matching for natural texts
title_short A new split based searching for exact pattern matching for natural texts
title_sort new split based searching for exact pattern matching for natural texts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6062031/
https://www.ncbi.nlm.nih.gov/pubmed/30048486
http://dx.doi.org/10.1371/journal.pone.0200912
work_keys_str_mv AT hakaksaqib anewsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT kamsinamirrudin anewsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT shivakumarapalaiahnakote anewsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT idnaidrismohdyamani anewsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT gilkargulshanamin anewsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT hakaksaqib newsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT kamsinamirrudin newsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT shivakumarapalaiahnakote newsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT idnaidrismohdyamani newsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT gilkargulshanamin newsplitbasedsearchingforexactpatternmatchingfornaturaltexts