Cargando…

Fast alignment-free sequence comparison using spaced-word frequencies

Motivation: Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignment...

Descripción completa

Detalles Bibliográficos
Autores principales:	Leimeister, Chris-Andre, Boden, Marcus, Horwege, Sebastian, Lindner, Sebastian, Morgenstern, Burkhard
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2014
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4080745/ https://www.ncbi.nlm.nih.gov/pubmed/24700317 http://dx.doi.org/10.1093/bioinformatics/btu177

_version_	1782324031918178304
author	Leimeister, Chris-Andre Boden, Marcus Horwege, Sebastian Lindner, Sebastian Morgenstern, Burkhard
author_facet	Leimeister, Chris-Andre Boden, Marcus Horwege, Sebastian Lindner, Sebastian Morgenstern, Burkhard
author_sort	Leimeister, Chris-Andre
collection	PubMed
description	Motivation: Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent. Results: To reduce the statistical dependency between adjacent word matches, we propose to use ‘spaced words’, defined by patterns of ‘match’ and ‘don’t care’ positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words. Availability and implementation: Our program is freely available at http://spaced.gobics.de/. Contact: chris.leimeister@stud.uni-goettingen.de Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-4080745
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-40807452014-07-03 Fast alignment-free sequence comparison using spaced-word frequencies Leimeister, Chris-Andre Boden, Marcus Horwege, Sebastian Lindner, Sebastian Morgenstern, Burkhard Bioinformatics Original Papers Motivation: Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent. Results: To reduce the statistical dependency between adjacent word matches, we propose to use ‘spaced words’, defined by patterns of ‘match’ and ‘don’t care’ positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words. Availability and implementation: Our program is freely available at http://spaced.gobics.de/. Contact: chris.leimeister@stud.uni-goettingen.de Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-07-15 2014-04-03 /pmc/articles/PMC4080745/ /pubmed/24700317 http://dx.doi.org/10.1093/bioinformatics/btu177 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Leimeister, Chris-Andre Boden, Marcus Horwege, Sebastian Lindner, Sebastian Morgenstern, Burkhard Fast alignment-free sequence comparison using spaced-word frequencies
title	Fast alignment-free sequence comparison using spaced-word frequencies
title_full	Fast alignment-free sequence comparison using spaced-word frequencies
title_fullStr	Fast alignment-free sequence comparison using spaced-word frequencies
title_full_unstemmed	Fast alignment-free sequence comparison using spaced-word frequencies
title_short	Fast alignment-free sequence comparison using spaced-word frequencies
title_sort	fast alignment-free sequence comparison using spaced-word frequencies
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4080745/ https://www.ncbi.nlm.nih.gov/pubmed/24700317 http://dx.doi.org/10.1093/bioinformatics/btu177
work_keys_str_mv	AT leimeisterchrisandre fastalignmentfreesequencecomparisonusingspacedwordfrequencies AT bodenmarcus fastalignmentfreesequencecomparisonusingspacedwordfrequencies AT horwegesebastian fastalignmentfreesequencecomparisonusingspacedwordfrequencies AT lindnersebastian fastalignmentfreesequencecomparisonusingspacedwordfrequencies AT morgensternburkhard fastalignmentfreesequencecomparisonusingspacedwordfrequencies

Fast alignment-free sequence comparison using spaced-word frequencies

Ejemplares similares