Cargando…

Fast algorithms for computing sequence distances by exhaustive substring composition

The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like st...

Descripción completa

Detalles Bibliográficos
Autores principales:	Apostolico, Alberto, Denas, Olgert
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2615014/ https://www.ncbi.nlm.nih.gov/pubmed/18957094 http://dx.doi.org/10.1186/1748-7188-3-13

_version_	1782163273861300224
author	Apostolico, Alberto Denas, Olgert
author_facet	Apostolico, Alberto Denas, Olgert
author_sort	Apostolico, Alberto
collection	PubMed
description	The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like string edit and gene rearrangement, due to a mixture of epistemological and computational problems. Alternative measures, based on the subword composition of sequences, have emerged in recent years and proved to be both fast and effective in a variety of tested cases. The common denominator of such measures is an underlying information theoretic notion of relative compressibility. Their viability depends critically on computational cost. The present paper describes as a paradigm the extension and efficient implementation of one of the methods in this class. The method is based on the comparison of the frequencies of all subwords in the two input sequences, where frequencies are suitably adjusted to take into account the statistical background.
format	Text
id	pubmed-2615014
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26150142009-01-12 Fast algorithms for computing sequence distances by exhaustive substring composition Apostolico, Alberto Denas, Olgert Algorithms Mol Biol Research The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like string edit and gene rearrangement, due to a mixture of epistemological and computational problems. Alternative measures, based on the subword composition of sequences, have emerged in recent years and proved to be both fast and effective in a variety of tested cases. The common denominator of such measures is an underlying information theoretic notion of relative compressibility. Their viability depends critically on computational cost. The present paper describes as a paradigm the extension and efficient implementation of one of the methods in this class. The method is based on the comparison of the frequencies of all subwords in the two input sequences, where frequencies are suitably adjusted to take into account the statistical background. BioMed Central 2008-10-28 /pmc/articles/PMC2615014/ /pubmed/18957094 http://dx.doi.org/10.1186/1748-7188-3-13 Text en Copyright © 2008 Apostolico and Denas; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Apostolico, Alberto Denas, Olgert Fast algorithms for computing sequence distances by exhaustive substring composition
title	Fast algorithms for computing sequence distances by exhaustive substring composition
title_full	Fast algorithms for computing sequence distances by exhaustive substring composition
title_fullStr	Fast algorithms for computing sequence distances by exhaustive substring composition
title_full_unstemmed	Fast algorithms for computing sequence distances by exhaustive substring composition
title_short	Fast algorithms for computing sequence distances by exhaustive substring composition
title_sort	fast algorithms for computing sequence distances by exhaustive substring composition
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2615014/ https://www.ncbi.nlm.nih.gov/pubmed/18957094 http://dx.doi.org/10.1186/1748-7188-3-13
work_keys_str_mv	AT apostolicoalberto fastalgorithmsforcomputingsequencedistancesbyexhaustivesubstringcomposition AT denasolgert fastalgorithmsforcomputingsequencedistancesbyexhaustivesubstringcomposition

Fast algorithms for computing sequence distances by exhaustive substring composition

Ejemplares similares