Cargando…
Imputing missing distances in molecular phylogenetics
Missing data are frequently encountered in molecular phylogenetics, but there has been no accurate distance imputation method available for distance-based phylogenetic reconstruction. The general framework for distance imputation is to explore tree space and distance values to find an optimal combin...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6063210/ https://www.ncbi.nlm.nih.gov/pubmed/30065887 http://dx.doi.org/10.7717/peerj.5321 |
_version_ | 1783342515904053248 |
---|---|
author | Xia, Xuhua |
author_facet | Xia, Xuhua |
author_sort | Xia, Xuhua |
collection | PubMed |
description | Missing data are frequently encountered in molecular phylogenetics, but there has been no accurate distance imputation method available for distance-based phylogenetic reconstruction. The general framework for distance imputation is to explore tree space and distance values to find an optimal combination of output tree and imputed distances. Here I develop a least-square method coupled with multivariate optimization to impute multiple missing distance in a distance matrix or from a set of aligned sequences with missing genes so that some sequences share no homologous sites (whose distances therefore need to be imputed). I show that phylogenetic trees can be inferred from distance matrices with about 10% of distances missing, and the accuracy of the resulting phylogenetic tree is almost as good as the tree from full information. The new method has the advantage over a recently published one in that it does not assume a molecular clock and is more accurate (comparable to maximum likelihood method based on simulated sequences). I have implemented the function in DAMBE software, which is freely available at http://dambe.bio.uottawa.ca. |
format | Online Article Text |
id | pubmed-6063210 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-60632102018-07-31 Imputing missing distances in molecular phylogenetics Xia, Xuhua PeerJ Bioinformatics Missing data are frequently encountered in molecular phylogenetics, but there has been no accurate distance imputation method available for distance-based phylogenetic reconstruction. The general framework for distance imputation is to explore tree space and distance values to find an optimal combination of output tree and imputed distances. Here I develop a least-square method coupled with multivariate optimization to impute multiple missing distance in a distance matrix or from a set of aligned sequences with missing genes so that some sequences share no homologous sites (whose distances therefore need to be imputed). I show that phylogenetic trees can be inferred from distance matrices with about 10% of distances missing, and the accuracy of the resulting phylogenetic tree is almost as good as the tree from full information. The new method has the advantage over a recently published one in that it does not assume a molecular clock and is more accurate (comparable to maximum likelihood method based on simulated sequences). I have implemented the function in DAMBE software, which is freely available at http://dambe.bio.uottawa.ca. PeerJ Inc. 2018-07-24 /pmc/articles/PMC6063210/ /pubmed/30065887 http://dx.doi.org/10.7717/peerj.5321 Text en ©2018 Xia http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Xia, Xuhua Imputing missing distances in molecular phylogenetics |
title | Imputing missing distances in molecular phylogenetics |
title_full | Imputing missing distances in molecular phylogenetics |
title_fullStr | Imputing missing distances in molecular phylogenetics |
title_full_unstemmed | Imputing missing distances in molecular phylogenetics |
title_short | Imputing missing distances in molecular phylogenetics |
title_sort | imputing missing distances in molecular phylogenetics |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6063210/ https://www.ncbi.nlm.nih.gov/pubmed/30065887 http://dx.doi.org/10.7717/peerj.5321 |
work_keys_str_mv | AT xiaxuhua imputingmissingdistancesinmolecularphylogenetics |