Cargando…

Analysis on the reconstruction accuracy of the Fitch method for inferring ancestral states

BACKGROUND: As one of the most widely used parsimony methods for ancestral reconstruction, the Fitch method minimizes the total number of hypothetical substitutions along all branches of a tree to explain the evolution of a character. Due to the extensive usage of this method, it has become a scient...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Jialiang, Li, Jun, Dong, Liuhuan, Grünewald, Stefan
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3030536/
https://www.ncbi.nlm.nih.gov/pubmed/21226965
http://dx.doi.org/10.1186/1471-2105-12-18
_version_ 1782197268370161664
author Yang, Jialiang
Li, Jun
Dong, Liuhuan
Grünewald, Stefan
author_facet Yang, Jialiang
Li, Jun
Dong, Liuhuan
Grünewald, Stefan
author_sort Yang, Jialiang
collection PubMed
description BACKGROUND: As one of the most widely used parsimony methods for ancestral reconstruction, the Fitch method minimizes the total number of hypothetical substitutions along all branches of a tree to explain the evolution of a character. Due to the extensive usage of this method, it has become a scientific endeavor in recent years to study the reconstruction accuracies of the Fitch method. However, most studies are restricted to 2-state evolutionary models and a study for higher-state models is needed since DNA sequences take the format of 4-state series and protein sequences even have 20 states. RESULTS: In this paper, the ambiguous and unambiguous reconstruction accuracy of the Fitch method are studied for N-state evolutionary models. Given an arbitrary phylogenetic tree, a recurrence system is first presented to calculate iteratively the two accuracies. As complete binary tree and comb-shaped tree are the two extremal evolutionary tree topologies according to balance, we focus on the reconstruction accuracies on these two topologies and analyze their asymptotic properties. Then, 1000 Yule trees with 1024 leaves are generated and analyzed to simulate real evolutionary scenarios. It is known that more taxa not necessarily increase the reconstruction accuracies under 2-state models. The result under N-state models is also tested. CONCLUSIONS: In a large tree with many leaves, the reconstruction accuracies of using all taxa are sometimes less than those of using a leaf subset under N-state models. For complete binary trees, there always exists an equilibrium interval [a, b] of conservation probability, in which the limiting ambiguous reconstruction accuracy equals to the probability of randomly picking a state. The value b decreases with the increase of the number of states, and it seems to converge. When the conservation probability is greater than b, the reconstruction accuracies of the Fitch method increase rapidly. The reconstruction accuracies on 1000 simulated Yule trees also exhibit similar behaviors. For comb-shaped trees, the limiting reconstruction accuracies of using all taxa are always less than or equal to those of using the nearest root-to-leaf path when the conservation probability is not less than [Formula: see text]. As a result, more taxa are suggested for ancestral reconstruction when the tree topology is balanced and the sequences are highly similar, and a few taxa close to the root are recommended otherwise.
format Text
id pubmed-3030536
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30305362011-01-31 Analysis on the reconstruction accuracy of the Fitch method for inferring ancestral states Yang, Jialiang Li, Jun Dong, Liuhuan Grünewald, Stefan BMC Bioinformatics Research Article BACKGROUND: As one of the most widely used parsimony methods for ancestral reconstruction, the Fitch method minimizes the total number of hypothetical substitutions along all branches of a tree to explain the evolution of a character. Due to the extensive usage of this method, it has become a scientific endeavor in recent years to study the reconstruction accuracies of the Fitch method. However, most studies are restricted to 2-state evolutionary models and a study for higher-state models is needed since DNA sequences take the format of 4-state series and protein sequences even have 20 states. RESULTS: In this paper, the ambiguous and unambiguous reconstruction accuracy of the Fitch method are studied for N-state evolutionary models. Given an arbitrary phylogenetic tree, a recurrence system is first presented to calculate iteratively the two accuracies. As complete binary tree and comb-shaped tree are the two extremal evolutionary tree topologies according to balance, we focus on the reconstruction accuracies on these two topologies and analyze their asymptotic properties. Then, 1000 Yule trees with 1024 leaves are generated and analyzed to simulate real evolutionary scenarios. It is known that more taxa not necessarily increase the reconstruction accuracies under 2-state models. The result under N-state models is also tested. CONCLUSIONS: In a large tree with many leaves, the reconstruction accuracies of using all taxa are sometimes less than those of using a leaf subset under N-state models. For complete binary trees, there always exists an equilibrium interval [a, b] of conservation probability, in which the limiting ambiguous reconstruction accuracy equals to the probability of randomly picking a state. The value b decreases with the increase of the number of states, and it seems to converge. When the conservation probability is greater than b, the reconstruction accuracies of the Fitch method increase rapidly. The reconstruction accuracies on 1000 simulated Yule trees also exhibit similar behaviors. For comb-shaped trees, the limiting reconstruction accuracies of using all taxa are always less than or equal to those of using the nearest root-to-leaf path when the conservation probability is not less than [Formula: see text]. As a result, more taxa are suggested for ancestral reconstruction when the tree topology is balanced and the sequences are highly similar, and a few taxa close to the root are recommended otherwise. BioMed Central 2011-01-13 /pmc/articles/PMC3030536/ /pubmed/21226965 http://dx.doi.org/10.1186/1471-2105-12-18 Text en Copyright ©2011 Yang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Yang, Jialiang
Li, Jun
Dong, Liuhuan
Grünewald, Stefan
Analysis on the reconstruction accuracy of the Fitch method for inferring ancestral states
title Analysis on the reconstruction accuracy of the Fitch method for inferring ancestral states
title_full Analysis on the reconstruction accuracy of the Fitch method for inferring ancestral states
title_fullStr Analysis on the reconstruction accuracy of the Fitch method for inferring ancestral states
title_full_unstemmed Analysis on the reconstruction accuracy of the Fitch method for inferring ancestral states
title_short Analysis on the reconstruction accuracy of the Fitch method for inferring ancestral states
title_sort analysis on the reconstruction accuracy of the fitch method for inferring ancestral states
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3030536/
https://www.ncbi.nlm.nih.gov/pubmed/21226965
http://dx.doi.org/10.1186/1471-2105-12-18
work_keys_str_mv AT yangjialiang analysisonthereconstructionaccuracyofthefitchmethodforinferringancestralstates
AT lijun analysisonthereconstructionaccuracyofthefitchmethodforinferringancestralstates
AT dongliuhuan analysisonthereconstructionaccuracyofthefitchmethodforinferringancestralstates
AT grunewaldstefan analysisonthereconstructionaccuracyofthefitchmethodforinferringancestralstates