Cargando…

Phylogeny reconstruction based on the length distribution of k-mismatch common substrings

BACKGROUND: Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487–1500, 2009) showed how the average number of substitutions per position between two DNA sequences can...

Descripción completa

Detalles Bibliográficos
Autores principales: Morgenstern, Burkhard, Schöbel, Svenja, Leimeister, Chris-André
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5724348/
https://www.ncbi.nlm.nih.gov/pubmed/29238399
http://dx.doi.org/10.1186/s13015-017-0118-8
Descripción
Sumario:BACKGROUND: Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487–1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings. RESULTS: In this paper, we study the length distribution of k-mismatch common substrings between two sequences. We show that the number of substitutions per position can be accurately estimated from the position of a local maximum in the length distribution of their k-mismatch common substrings.