Cargando…

Imputing missing distances in molecular phylogenetics

Missing data are frequently encountered in molecular phylogenetics, but there has been no accurate distance imputation method available for distance-based phylogenetic reconstruction. The general framework for distance imputation is to explore tree space and distance values to find an optimal combin...

Descripción completa

Detalles Bibliográficos
Autor principal: Xia, Xuhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6063210/
https://www.ncbi.nlm.nih.gov/pubmed/30065887
http://dx.doi.org/10.7717/peerj.5321
Descripción
Sumario:Missing data are frequently encountered in molecular phylogenetics, but there has been no accurate distance imputation method available for distance-based phylogenetic reconstruction. The general framework for distance imputation is to explore tree space and distance values to find an optimal combination of output tree and imputed distances. Here I develop a least-square method coupled with multivariate optimization to impute multiple missing distance in a distance matrix or from a set of aligned sequences with missing genes so that some sequences share no homologous sites (whose distances therefore need to be imputed). I show that phylogenetic trees can be inferred from distance matrices with about 10% of distances missing, and the accuracy of the resulting phylogenetic tree is almost as good as the tree from full information. The new method has the advantage over a recently published one in that it does not assume a molecular clock and is more accurate (comparable to maximum likelihood method based on simulated sequences). I have implemented the function in DAMBE software, which is freely available at http://dambe.bio.uottawa.ca.