Cargando…

Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf

BACKGROUND: Phylogenetic tree comparison metrics are an important tool in the study of evolution, and hence the definition of such metrics is an interesting problem in phylogenetics. In a paper in Taxon fifty years ago, Sokal and Rohlf proposed to measure quantitatively the difference between a pair...

Descripción completa

Detalles Bibliográficos
Autores principales: Cardona, Gabriel, Mir, Arnau, Rosselló, Francesc, Rotger, Lucía, Sánchez, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3716993/
https://www.ncbi.nlm.nih.gov/pubmed/23323711
http://dx.doi.org/10.1186/1471-2105-14-3
Descripción
Sumario:BACKGROUND: Phylogenetic tree comparison metrics are an important tool in the study of evolution, and hence the definition of such metrics is an interesting problem in phylogenetics. In a paper in Taxon fifty years ago, Sokal and Rohlf proposed to measure quantitatively the difference between a pair of phylogenetic trees by first encoding them by means of their half-matrices of cophenetic values, and then comparing these matrices. This idea has been used several times since then to define dissimilarity measures between phylogenetic trees but, to our knowledge, no proper metric on weighted phylogenetic trees with nested taxa based on this idea has been formally defined and studied yet. Actually, the cophenetic values of pairs of different taxa alone are not enough to single out phylogenetic trees with weighted arcs or nested taxa. RESULTS: For every (rooted) phylogenetic tree T, let its cophenetic vectorφ(T) consist of all pairs of cophenetic values between pairs of taxa in T and all depths of taxa in T. It turns out that these cophenetic vectors single out weighted phylogenetic trees with nested taxa. We then define a family of cophenetic metrics d(φ,p) by comparing these cophenetic vectors by means of L(p) norms, and we study, either analytically or numerically, some of their basic properties: neighbors, diameter, distribution, and their rank correlation with each other and with other metrics. CONCLUSIONS: The cophenetic metrics can be safely used on weighted phylogenetic trees with nested taxa and no restriction on degrees, and they can be computed in O(n(2)) time, where n stands for the number of taxa. The metrics d(φ,1) and d(φ,2) have positive skewed distributions, and they show a low rank correlation with the Robinson-Foulds metric and the nodal metrics, and a very high correlation with each other and with the splitted nodal metrics. The diameter of d(φ,p), for [Formula: see text] , is in O(n((p+2)/p)), and thus for low p they are more discriminative, having a wider range of values.