Cargando…

Recapitulating phylogenies using k-mers: from trees to networks

Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredite...

Descripción completa

Detalles Bibliográficos
Autores principales: Bernard, Guillaume, Ragan, Mark A., Chan, Cheong Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000Research 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224691/
https://www.ncbi.nlm.nih.gov/pubmed/28105314
http://dx.doi.org/10.12688/f1000research.10225.2
_version_ 1782493410296332288
author Bernard, Guillaume
Ragan, Mark A.
Chan, Cheong Xin
author_facet Bernard, Guillaume
Ragan, Mark A.
Chan, Cheong Xin
author_sort Bernard, Guillaume
collection PubMed
description Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k-mers (subsequences at fixed length k). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel’s idea of ontogeny, we argue that genome phylogenies can be inferred using k-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.
format Online
Article
Text
id pubmed-5224691
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher F1000Research
record_format MEDLINE/PubMed
spelling pubmed-52246912017-01-18 Recapitulating phylogenies using k-mers: from trees to networks Bernard, Guillaume Ragan, Mark A. Chan, Cheong Xin F1000Res Research Note Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k-mers (subsequences at fixed length k). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel’s idea of ontogeny, we argue that genome phylogenies can be inferred using k-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner. F1000Research 2016-12-23 /pmc/articles/PMC5224691/ /pubmed/28105314 http://dx.doi.org/10.12688/f1000research.10225.2 Text en Copyright: © 2016 Bernard G et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Note
Bernard, Guillaume
Ragan, Mark A.
Chan, Cheong Xin
Recapitulating phylogenies using k-mers: from trees to networks
title Recapitulating phylogenies using k-mers: from trees to networks
title_full Recapitulating phylogenies using k-mers: from trees to networks
title_fullStr Recapitulating phylogenies using k-mers: from trees to networks
title_full_unstemmed Recapitulating phylogenies using k-mers: from trees to networks
title_short Recapitulating phylogenies using k-mers: from trees to networks
title_sort recapitulating phylogenies using k-mers: from trees to networks
topic Research Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224691/
https://www.ncbi.nlm.nih.gov/pubmed/28105314
http://dx.doi.org/10.12688/f1000research.10225.2
work_keys_str_mv AT bernardguillaume recapitulatingphylogeniesusingkmersfromtreestonetworks
AT raganmarka recapitulatingphylogeniesusingkmersfromtreestonetworks
AT chancheongxin recapitulatingphylogeniesusingkmersfromtreestonetworks