Cargando…
Estimating evolutionary distances between genomic sequences from spaced-word matches
Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we prop...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4327811/ https://www.ncbi.nlm.nih.gov/pubmed/25685176 http://dx.doi.org/10.1186/s13015-015-0032-x |
_version_ | 1782357159818821632 |
---|---|
author | Morgenstern, Burkhard Zhu, Bingyao Horwege, Sebastian Leimeister, Chris André |
author_facet | Morgenstern, Burkhard Zhu, Bingyao Horwege, Sebastian Leimeister, Chris André |
author_sort | Morgenstern, Burkhard |
collection | PubMed |
description | Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d(N) of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of ‘match positions’ and ‘don’t care positions’. Our software is available online and as downloadable source code at: http://spaced.gobics.de/. |
format | Online Article Text |
id | pubmed-4327811 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43278112015-02-14 Estimating evolutionary distances between genomic sequences from spaced-word matches Morgenstern, Burkhard Zhu, Bingyao Horwege, Sebastian Leimeister, Chris André Algorithms Mol Biol Research Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d(N) of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of ‘match positions’ and ‘don’t care positions’. Our software is available online and as downloadable source code at: http://spaced.gobics.de/. BioMed Central 2015-02-11 /pmc/articles/PMC4327811/ /pubmed/25685176 http://dx.doi.org/10.1186/s13015-015-0032-x Text en © Morgenstern et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Morgenstern, Burkhard Zhu, Bingyao Horwege, Sebastian Leimeister, Chris André Estimating evolutionary distances between genomic sequences from spaced-word matches |
title | Estimating evolutionary distances between genomic sequences from spaced-word matches |
title_full | Estimating evolutionary distances between genomic sequences from spaced-word matches |
title_fullStr | Estimating evolutionary distances between genomic sequences from spaced-word matches |
title_full_unstemmed | Estimating evolutionary distances between genomic sequences from spaced-word matches |
title_short | Estimating evolutionary distances between genomic sequences from spaced-word matches |
title_sort | estimating evolutionary distances between genomic sequences from spaced-word matches |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4327811/ https://www.ncbi.nlm.nih.gov/pubmed/25685176 http://dx.doi.org/10.1186/s13015-015-0032-x |
work_keys_str_mv | AT morgensternburkhard estimatingevolutionarydistancesbetweengenomicsequencesfromspacedwordmatches AT zhubingyao estimatingevolutionarydistancesbetweengenomicsequencesfromspacedwordmatches AT horwegesebastian estimatingevolutionarydistancesbetweengenomicsequencesfromspacedwordmatches AT leimeisterchrisandre estimatingevolutionarydistancesbetweengenomicsequencesfromspacedwordmatches |