Cargando…

Estimating evolutionary distances between genomic sequences from spaced-word matches

Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we prop...

Descripción completa

Detalles Bibliográficos
Autores principales: Morgenstern, Burkhard, Zhu, Bingyao, Horwege, Sebastian, Leimeister, Chris André
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4327811/
https://www.ncbi.nlm.nih.gov/pubmed/25685176
http://dx.doi.org/10.1186/s13015-015-0032-x
_version_ 1782357159818821632
author Morgenstern, Burkhard
Zhu, Bingyao
Horwege, Sebastian
Leimeister, Chris André
author_facet Morgenstern, Burkhard
Zhu, Bingyao
Horwege, Sebastian
Leimeister, Chris André
author_sort Morgenstern, Burkhard
collection PubMed
description Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d(N) of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of ‘match positions’ and ‘don’t care positions’. Our software is available online and as downloadable source code at: http://spaced.gobics.de/.
format Online
Article
Text
id pubmed-4327811
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43278112015-02-14 Estimating evolutionary distances between genomic sequences from spaced-word matches Morgenstern, Burkhard Zhu, Bingyao Horwege, Sebastian Leimeister, Chris André Algorithms Mol Biol Research Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d(N) of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of ‘match positions’ and ‘don’t care positions’. Our software is available online and as downloadable source code at: http://spaced.gobics.de/. BioMed Central 2015-02-11 /pmc/articles/PMC4327811/ /pubmed/25685176 http://dx.doi.org/10.1186/s13015-015-0032-x Text en © Morgenstern et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Morgenstern, Burkhard
Zhu, Bingyao
Horwege, Sebastian
Leimeister, Chris André
Estimating evolutionary distances between genomic sequences from spaced-word matches
title Estimating evolutionary distances between genomic sequences from spaced-word matches
title_full Estimating evolutionary distances between genomic sequences from spaced-word matches
title_fullStr Estimating evolutionary distances between genomic sequences from spaced-word matches
title_full_unstemmed Estimating evolutionary distances between genomic sequences from spaced-word matches
title_short Estimating evolutionary distances between genomic sequences from spaced-word matches
title_sort estimating evolutionary distances between genomic sequences from spaced-word matches
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4327811/
https://www.ncbi.nlm.nih.gov/pubmed/25685176
http://dx.doi.org/10.1186/s13015-015-0032-x
work_keys_str_mv AT morgensternburkhard estimatingevolutionarydistancesbetweengenomicsequencesfromspacedwordmatches
AT zhubingyao estimatingevolutionarydistancesbetweengenomicsequencesfromspacedwordmatches
AT horwegesebastian estimatingevolutionarydistancesbetweengenomicsequencesfromspacedwordmatches
AT leimeisterchrisandre estimatingevolutionarydistancesbetweengenomicsequencesfromspacedwordmatches