Cargando…

Fast Dating Using Least-Squares Criteria and Algorithms

Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals wit...

Descripción completa

Detalles Bibliográficos
Autores principales: To, Thu-Hien, Jung, Matthieu, Lycett, Samantha, Gascuel, Olivier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4678253/
https://www.ncbi.nlm.nih.gov/pubmed/26424727
http://dx.doi.org/10.1093/sysbio/syv068
_version_ 1782405425366302720
author To, Thu-Hien
Jung, Matthieu
Lycett, Samantha
Gascuel, Olivier
author_facet To, Thu-Hien
Jung, Matthieu
Lycett, Samantha
Gascuel, Olivier
author_sort To, Thu-Hien
collection PubMed
description Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley–Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley–Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3.
format Online
Article
Text
id pubmed-4678253
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46782532015-12-15 Fast Dating Using Least-Squares Criteria and Algorithms To, Thu-Hien Jung, Matthieu Lycett, Samantha Gascuel, Olivier Syst Biol Regular Articles Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley–Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley–Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3. Oxford University Press 2016-01 2015-09-30 /pmc/articles/PMC4678253/ /pubmed/26424727 http://dx.doi.org/10.1093/sysbio/syv068 Text en © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regular Articles
To, Thu-Hien
Jung, Matthieu
Lycett, Samantha
Gascuel, Olivier
Fast Dating Using Least-Squares Criteria and Algorithms
title Fast Dating Using Least-Squares Criteria and Algorithms
title_full Fast Dating Using Least-Squares Criteria and Algorithms
title_fullStr Fast Dating Using Least-Squares Criteria and Algorithms
title_full_unstemmed Fast Dating Using Least-Squares Criteria and Algorithms
title_short Fast Dating Using Least-Squares Criteria and Algorithms
title_sort fast dating using least-squares criteria and algorithms
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4678253/
https://www.ncbi.nlm.nih.gov/pubmed/26424727
http://dx.doi.org/10.1093/sysbio/syv068
work_keys_str_mv AT tothuhien fastdatingusingleastsquarescriteriaandalgorithms
AT jungmatthieu fastdatingusingleastsquarescriteriaandalgorithms
AT lycettsamantha fastdatingusingleastsquarescriteriaandalgorithms
AT gascuelolivier fastdatingusingleastsquarescriteriaandalgorithms