Cargando…
Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach
Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013127/ https://www.ncbi.nlm.nih.gov/pubmed/21209825 http://dx.doi.org/10.1371/journal.pone.0015788 |
_version_ | 1782195235146694656 |
---|---|
author | Schwarz, Roland F. Fletcher, William Förster, Frank Merget, Benjamin Wolf, Matthias Schultz, Jörg Markowetz, Florian |
author_facet | Schwarz, Roland F. Fletcher, William Förster, Frank Merget, Benjamin Wolf, Matthias Schultz, Jörg Markowetz, Florian |
author_sort | Schwarz, Roland F. |
collection | PubMed |
description | Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets. |
format | Text |
id | pubmed-3013127 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-30131272011-01-05 Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach Schwarz, Roland F. Fletcher, William Förster, Frank Merget, Benjamin Wolf, Matthias Schultz, Jörg Markowetz, Florian PLoS One Research Article Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets. Public Library of Science 2010-12-31 /pmc/articles/PMC3013127/ /pubmed/21209825 http://dx.doi.org/10.1371/journal.pone.0015788 Text en Schwarz et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Schwarz, Roland F. Fletcher, William Förster, Frank Merget, Benjamin Wolf, Matthias Schultz, Jörg Markowetz, Florian Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach |
title | Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach |
title_full | Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach |
title_fullStr | Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach |
title_full_unstemmed | Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach |
title_short | Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach |
title_sort | evolutionary distances in the twilight zone—a rational kernel approach |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013127/ https://www.ncbi.nlm.nih.gov/pubmed/21209825 http://dx.doi.org/10.1371/journal.pone.0015788 |
work_keys_str_mv | AT schwarzrolandf evolutionarydistancesinthetwilightzonearationalkernelapproach AT fletcherwilliam evolutionarydistancesinthetwilightzonearationalkernelapproach AT forsterfrank evolutionarydistancesinthetwilightzonearationalkernelapproach AT mergetbenjamin evolutionarydistancesinthetwilightzonearationalkernelapproach AT wolfmatthias evolutionarydistancesinthetwilightzonearationalkernelapproach AT schultzjorg evolutionarydistancesinthetwilightzonearationalkernelapproach AT markowetzflorian evolutionarydistancesinthetwilightzonearationalkernelapproach |