Cargando…

Imputing missing distances in molecular phylogenetics

Missing data are frequently encountered in molecular phylogenetics, but there has been no accurate distance imputation method available for distance-based phylogenetic reconstruction. The general framework for distance imputation is to explore tree space and distance values to find an optimal combin...

Descripción completa

Detalles Bibliográficos
Autor principal: Xia, Xuhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6063210/
https://www.ncbi.nlm.nih.gov/pubmed/30065887
http://dx.doi.org/10.7717/peerj.5321
_version_ 1783342515904053248
author Xia, Xuhua
author_facet Xia, Xuhua
author_sort Xia, Xuhua
collection PubMed
description Missing data are frequently encountered in molecular phylogenetics, but there has been no accurate distance imputation method available for distance-based phylogenetic reconstruction. The general framework for distance imputation is to explore tree space and distance values to find an optimal combination of output tree and imputed distances. Here I develop a least-square method coupled with multivariate optimization to impute multiple missing distance in a distance matrix or from a set of aligned sequences with missing genes so that some sequences share no homologous sites (whose distances therefore need to be imputed). I show that phylogenetic trees can be inferred from distance matrices with about 10% of distances missing, and the accuracy of the resulting phylogenetic tree is almost as good as the tree from full information. The new method has the advantage over a recently published one in that it does not assume a molecular clock and is more accurate (comparable to maximum likelihood method based on simulated sequences). I have implemented the function in DAMBE software, which is freely available at http://dambe.bio.uottawa.ca.
format Online
Article
Text
id pubmed-6063210
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-60632102018-07-31 Imputing missing distances in molecular phylogenetics Xia, Xuhua PeerJ Bioinformatics Missing data are frequently encountered in molecular phylogenetics, but there has been no accurate distance imputation method available for distance-based phylogenetic reconstruction. The general framework for distance imputation is to explore tree space and distance values to find an optimal combination of output tree and imputed distances. Here I develop a least-square method coupled with multivariate optimization to impute multiple missing distance in a distance matrix or from a set of aligned sequences with missing genes so that some sequences share no homologous sites (whose distances therefore need to be imputed). I show that phylogenetic trees can be inferred from distance matrices with about 10% of distances missing, and the accuracy of the resulting phylogenetic tree is almost as good as the tree from full information. The new method has the advantage over a recently published one in that it does not assume a molecular clock and is more accurate (comparable to maximum likelihood method based on simulated sequences). I have implemented the function in DAMBE software, which is freely available at http://dambe.bio.uottawa.ca. PeerJ Inc. 2018-07-24 /pmc/articles/PMC6063210/ /pubmed/30065887 http://dx.doi.org/10.7717/peerj.5321 Text en ©2018 Xia http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Xia, Xuhua
Imputing missing distances in molecular phylogenetics
title Imputing missing distances in molecular phylogenetics
title_full Imputing missing distances in molecular phylogenetics
title_fullStr Imputing missing distances in molecular phylogenetics
title_full_unstemmed Imputing missing distances in molecular phylogenetics
title_short Imputing missing distances in molecular phylogenetics
title_sort imputing missing distances in molecular phylogenetics
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6063210/
https://www.ncbi.nlm.nih.gov/pubmed/30065887
http://dx.doi.org/10.7717/peerj.5321
work_keys_str_mv AT xiaxuhua imputingmissingdistancesinmolecularphylogenetics