Cargando…

Network science inspires novel tree shape statistics

The shape of phylogenetic trees can be used to gain evolutionary insights. A tree’s shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, w...

Descripción completa

Detalles Bibliográficos
Autores principales: Chindelevitch, Leonid, Hayati, Maryam, Poon, Art F. Y., Colijn, Caroline
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8699983/
https://www.ncbi.nlm.nih.gov/pubmed/34941890
http://dx.doi.org/10.1371/journal.pone.0259877
_version_ 1784620646123175936
author Chindelevitch, Leonid
Hayati, Maryam
Poon, Art F. Y.
Colijn, Caroline
author_facet Chindelevitch, Leonid
Hayati, Maryam
Poon, Art F. Y.
Colijn, Caroline
author_sort Chindelevitch, Leonid
collection PubMed
description The shape of phylogenetic trees can be used to gain evolutionary insights. A tree’s shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package at http://github.com/Leonardini/treeCentrality.
format Online
Article
Text
id pubmed-8699983
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-86999832021-12-24 Network science inspires novel tree shape statistics Chindelevitch, Leonid Hayati, Maryam Poon, Art F. Y. Colijn, Caroline PLoS One Research Article The shape of phylogenetic trees can be used to gain evolutionary insights. A tree’s shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package at http://github.com/Leonardini/treeCentrality. Public Library of Science 2021-12-23 /pmc/articles/PMC8699983/ /pubmed/34941890 http://dx.doi.org/10.1371/journal.pone.0259877 Text en © 2021 Chindelevitch et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Chindelevitch, Leonid
Hayati, Maryam
Poon, Art F. Y.
Colijn, Caroline
Network science inspires novel tree shape statistics
title Network science inspires novel tree shape statistics
title_full Network science inspires novel tree shape statistics
title_fullStr Network science inspires novel tree shape statistics
title_full_unstemmed Network science inspires novel tree shape statistics
title_short Network science inspires novel tree shape statistics
title_sort network science inspires novel tree shape statistics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8699983/
https://www.ncbi.nlm.nih.gov/pubmed/34941890
http://dx.doi.org/10.1371/journal.pone.0259877
work_keys_str_mv AT chindelevitchleonid networkscienceinspiresnoveltreeshapestatistics
AT hayatimaryam networkscienceinspiresnoveltreeshapestatistics
AT poonartfy networkscienceinspiresnoveltreeshapestatistics
AT colijncaroline networkscienceinspiresnoveltreeshapestatistics