Cargando…

Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction

Phylogenetic tree reconstruction requires construction of a multiple sequence alignment (MSA) from sequences. Computationally, it is difficult to achieve an optimal MSA for many sequences. Moreover, even if an optimal MSA is obtained, it may not be the true MSA that reflects the evolutionary history...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Kuan, Zhang, Liqing
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2275138/
https://www.ncbi.nlm.nih.gov/pubmed/18296485
http://dx.doi.org/10.1093/nar/gkn075
_version_ 1782151823155527680
author Yang, Kuan
Zhang, Liqing
author_facet Yang, Kuan
Zhang, Liqing
author_sort Yang, Kuan
collection PubMed
description Phylogenetic tree reconstruction requires construction of a multiple sequence alignment (MSA) from sequences. Computationally, it is difficult to achieve an optimal MSA for many sequences. Moreover, even if an optimal MSA is obtained, it may not be the true MSA that reflects the evolutionary history of the underlying sequences. Therefore, errors can be introduced during MSA construction which in turn affects the subsequent phylogenetic tree construction. In order to circumvent this issue, we extend the application of the k-tuple distance to phylogenetic tree reconstruction. The k-tuple distance between two sequences is the sum of the differences in frequency, over all possible tuples of length k, between the sequences and can be estimated without MSAs. It has been traditionally used to build a fast ‘guide tree’ to assist the construction of MSAs. Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including Jukes–Cantor, Kimura, F84 and Tamura–Nei. These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences. Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators. Furthermore, as the k-tuple distance voids the need for constructing an MSA, it can save tremendous amount of time for phylogenetic tree reconstructions when the data include a large number of sequences.
format Text
id pubmed-2275138
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-22751382008-04-07 Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction Yang, Kuan Zhang, Liqing Nucleic Acids Res Methods Online Phylogenetic tree reconstruction requires construction of a multiple sequence alignment (MSA) from sequences. Computationally, it is difficult to achieve an optimal MSA for many sequences. Moreover, even if an optimal MSA is obtained, it may not be the true MSA that reflects the evolutionary history of the underlying sequences. Therefore, errors can be introduced during MSA construction which in turn affects the subsequent phylogenetic tree construction. In order to circumvent this issue, we extend the application of the k-tuple distance to phylogenetic tree reconstruction. The k-tuple distance between two sequences is the sum of the differences in frequency, over all possible tuples of length k, between the sequences and can be estimated without MSAs. It has been traditionally used to build a fast ‘guide tree’ to assist the construction of MSAs. Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including Jukes–Cantor, Kimura, F84 and Tamura–Nei. These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences. Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators. Furthermore, as the k-tuple distance voids the need for constructing an MSA, it can save tremendous amount of time for phylogenetic tree reconstructions when the data include a large number of sequences. Oxford University Press 2008-03 2008-02-22 /pmc/articles/PMC2275138/ /pubmed/18296485 http://dx.doi.org/10.1093/nar/gkn075 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Yang, Kuan
Zhang, Liqing
Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction
title Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction
title_full Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction
title_fullStr Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction
title_full_unstemmed Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction
title_short Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction
title_sort performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2275138/
https://www.ncbi.nlm.nih.gov/pubmed/18296485
http://dx.doi.org/10.1093/nar/gkn075
work_keys_str_mv AT yangkuan performancecomparisonbetweenktupledistanceandfourmodelbaseddistancesinphylogenetictreereconstruction
AT zhangliqing performancecomparisonbetweenktupledistanceandfourmodelbaseddistancesinphylogenetictreereconstruction