Cargando…

Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem

The all-pairs suffix-prefix matching problem is a basic problem in string processing. It has an application in the de novo genome assembly task, which is one of the major bioinformatics problems. Due to the large size of the input data, it is crucial to use fast and space efficient solutions. In thi...

Descripción completa

Detalles Bibliográficos
Autores principales: Haj Rachid, Maan, Malluhi, Qutaibah, Abouelhoda, Mohamed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4009283/
https://www.ncbi.nlm.nih.gov/pubmed/24834435
http://dx.doi.org/10.1155/2014/745298
_version_ 1782479742764580864
author Haj Rachid, Maan
Malluhi, Qutaibah
Abouelhoda, Mohamed
author_facet Haj Rachid, Maan
Malluhi, Qutaibah
Abouelhoda, Mohamed
author_sort Haj Rachid, Maan
collection PubMed
description The all-pairs suffix-prefix matching problem is a basic problem in string processing. It has an application in the de novo genome assembly task, which is one of the major bioinformatics problems. Due to the large size of the input data, it is crucial to use fast and space efficient solutions. In this paper, we present a space-economical solution to this problem using the generalized Sadakane compressed suffix tree. Furthermore, we present a parallel algorithm to provide more speed for shared memory computers. Our sequential and parallel algorithms are optimized by exploiting features of the Sadakane compressed index data structure. Experimental results show that our solution based on the Sadakane's compressed index consumes significantly less space than the ones based on noncompressed data structures like the suffix tree and the enhanced suffix array. Our experimental results show that our parallel algorithm is efficient and scales well with increasing number of processors.
format Online
Article
Text
id pubmed-4009283
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-40092832014-05-15 Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem Haj Rachid, Maan Malluhi, Qutaibah Abouelhoda, Mohamed Biomed Res Int Research Article The all-pairs suffix-prefix matching problem is a basic problem in string processing. It has an application in the de novo genome assembly task, which is one of the major bioinformatics problems. Due to the large size of the input data, it is crucial to use fast and space efficient solutions. In this paper, we present a space-economical solution to this problem using the generalized Sadakane compressed suffix tree. Furthermore, we present a parallel algorithm to provide more speed for shared memory computers. Our sequential and parallel algorithms are optimized by exploiting features of the Sadakane compressed index data structure. Experimental results show that our solution based on the Sadakane's compressed index consumes significantly less space than the ones based on noncompressed data structures like the suffix tree and the enhanced suffix array. Our experimental results show that our parallel algorithm is efficient and scales well with increasing number of processors. Hindawi Publishing Corporation 2014 2014-04-16 /pmc/articles/PMC4009283/ /pubmed/24834435 http://dx.doi.org/10.1155/2014/745298 Text en Copyright © 2014 Maan Haj Rachid et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Haj Rachid, Maan
Malluhi, Qutaibah
Abouelhoda, Mohamed
Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem
title Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem
title_full Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem
title_fullStr Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem
title_full_unstemmed Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem
title_short Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem
title_sort using the sadakane compressed suffix tree to solve the all-pairs suffix-prefix problem
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4009283/
https://www.ncbi.nlm.nih.gov/pubmed/24834435
http://dx.doi.org/10.1155/2014/745298
work_keys_str_mv AT hajrachidmaan usingthesadakanecompressedsuffixtreetosolvetheallpairssuffixprefixproblem
AT malluhiqutaibah usingthesadakanecompressedsuffixtreetosolvetheallpairssuffixprefixproblem
AT abouelhodamohamed usingthesadakanecompressedsuffixtreetosolvetheallpairssuffixprefixproblem