Cargando…

Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract...

Descripción completa

Detalles Bibliográficos
Autores principales: Schwarz, Roland F., Fletcher, William, Förster, Frank, Merget, Benjamin, Wolf, Matthias, Schultz, Jörg, Markowetz, Florian
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013127/
https://www.ncbi.nlm.nih.gov/pubmed/21209825
http://dx.doi.org/10.1371/journal.pone.0015788
_version_ 1782195235146694656
author Schwarz, Roland F.
Fletcher, William
Förster, Frank
Merget, Benjamin
Wolf, Matthias
Schultz, Jörg
Markowetz, Florian
author_facet Schwarz, Roland F.
Fletcher, William
Förster, Frank
Merget, Benjamin
Wolf, Matthias
Schultz, Jörg
Markowetz, Florian
author_sort Schwarz, Roland F.
collection PubMed
description Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.
format Text
id pubmed-3013127
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30131272011-01-05 Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach Schwarz, Roland F. Fletcher, William Förster, Frank Merget, Benjamin Wolf, Matthias Schultz, Jörg Markowetz, Florian PLoS One Research Article Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets. Public Library of Science 2010-12-31 /pmc/articles/PMC3013127/ /pubmed/21209825 http://dx.doi.org/10.1371/journal.pone.0015788 Text en Schwarz et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Schwarz, Roland F.
Fletcher, William
Förster, Frank
Merget, Benjamin
Wolf, Matthias
Schultz, Jörg
Markowetz, Florian
Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach
title Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach
title_full Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach
title_fullStr Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach
title_full_unstemmed Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach
title_short Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach
title_sort evolutionary distances in the twilight zone—a rational kernel approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013127/
https://www.ncbi.nlm.nih.gov/pubmed/21209825
http://dx.doi.org/10.1371/journal.pone.0015788
work_keys_str_mv AT schwarzrolandf evolutionarydistancesinthetwilightzonearationalkernelapproach
AT fletcherwilliam evolutionarydistancesinthetwilightzonearationalkernelapproach
AT forsterfrank evolutionarydistancesinthetwilightzonearationalkernelapproach
AT mergetbenjamin evolutionarydistancesinthetwilightzonearationalkernelapproach
AT wolfmatthias evolutionarydistancesinthetwilightzonearationalkernelapproach
AT schultzjorg evolutionarydistancesinthetwilightzonearationalkernelapproach
AT markowetzflorian evolutionarydistancesinthetwilightzonearationalkernelapproach