Cargando…

Linear-time computation of minimal absent words using suffix array

BACKGROUND: An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation also provides a fast alternative for me...

Descripción completa

Detalles Bibliográficos
Autores principales:	Barton, Carl, Heliou, Alice, Mouchard, Laurent, Pissis, Solon P
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4297395/ https://www.ncbi.nlm.nih.gov/pubmed/25526884 http://dx.doi.org/10.1186/s12859-014-0388-9

_version_	1782353141422882816
author	Barton, Carl Heliou, Alice Mouchard, Laurent Pissis, Solon P
author_facet	Barton, Carl Heliou, Alice Mouchard, Laurent Pissis, Solon P
author_sort	Barton, Carl
collection	PubMed
description	BACKGROUND: An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation also provides a fast alternative for measuring approximation in sequence comparison. There exists an [Formula: see text] -time and [Formula: see text] -space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix automata (Crochemore et al., 1998). No implementation of this algorithm is publicly available. There also exists an [Formula: see text] -time and [Formula: see text] -space algorithm for the same problem based on the construction of suffix arrays (Pinho et al., 2009). An implementation of this algorithm was also provided by the authors and is currently the fastest available. RESULTS: Our contribution in this article is twofold: first, we bridge this unpleasant gap by presenting an [Formula: see text] -time and [Formula: see text] -space algorithm for computing all minimal absent words based on the construction of suffix arrays; and second, we provide the respective implementation of this algorithm. Experimental results, using real and synthetic data, show that this implementation outperforms the one by Pinho et al. The open-source code of our implementation is freely available at http://github.com/solonas13/maw. CONCLUSIONS: Classical notions for sequence comparison are increasingly being replaced by other similarity measures that refer to the composition of sequences in terms of their constituent patterns. One such measure is the minimal absent words. In this article, we present a new linear-time and linear-space algorithm for the computation of minimal absent words based on the suffix array.
format	Online Article Text
id	pubmed-4297395
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42973952015-02-03 Linear-time computation of minimal absent words using suffix array Barton, Carl Heliou, Alice Mouchard, Laurent Pissis, Solon P BMC Bioinformatics Research Article BACKGROUND: An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation also provides a fast alternative for measuring approximation in sequence comparison. There exists an [Formula: see text] -time and [Formula: see text] -space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix automata (Crochemore et al., 1998). No implementation of this algorithm is publicly available. There also exists an [Formula: see text] -time and [Formula: see text] -space algorithm for the same problem based on the construction of suffix arrays (Pinho et al., 2009). An implementation of this algorithm was also provided by the authors and is currently the fastest available. RESULTS: Our contribution in this article is twofold: first, we bridge this unpleasant gap by presenting an [Formula: see text] -time and [Formula: see text] -space algorithm for computing all minimal absent words based on the construction of suffix arrays; and second, we provide the respective implementation of this algorithm. Experimental results, using real and synthetic data, show that this implementation outperforms the one by Pinho et al. The open-source code of our implementation is freely available at http://github.com/solonas13/maw. CONCLUSIONS: Classical notions for sequence comparison are increasingly being replaced by other similarity measures that refer to the composition of sequences in terms of their constituent patterns. One such measure is the minimal absent words. In this article, we present a new linear-time and linear-space algorithm for the computation of minimal absent words based on the suffix array. BioMed Central 2014-12-20 /pmc/articles/PMC4297395/ /pubmed/25526884 http://dx.doi.org/10.1186/s12859-014-0388-9 Text en © Barton et al.; licensee BioMed Central. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Barton, Carl Heliou, Alice Mouchard, Laurent Pissis, Solon P Linear-time computation of minimal absent words using suffix array
title	Linear-time computation of minimal absent words using suffix array
title_full	Linear-time computation of minimal absent words using suffix array
title_fullStr	Linear-time computation of minimal absent words using suffix array
title_full_unstemmed	Linear-time computation of minimal absent words using suffix array
title_short	Linear-time computation of minimal absent words using suffix array
title_sort	linear-time computation of minimal absent words using suffix array
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4297395/ https://www.ncbi.nlm.nih.gov/pubmed/25526884 http://dx.doi.org/10.1186/s12859-014-0388-9
work_keys_str_mv	AT bartoncarl lineartimecomputationofminimalabsentwordsusingsuffixarray AT helioualice lineartimecomputationofminimalabsentwordsusingsuffixarray AT mouchardlaurent lineartimecomputationofminimalabsentwordsusingsuffixarray AT pississolonp lineartimecomputationofminimalabsentwordsusingsuffixarray

Linear-time computation of minimal absent words using suffix array

Ejemplares similares