Cargando…

k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank

Microbial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. These processes can be distinguished by alignment-based inference and comparison of phylogenetic trees for individual gene families, but this approach is not scalable to whole-genome sequences,...

Descripción completa

Detalles Bibliográficos
Autores principales: Bernard, Guillaume, Greenfield, Paul, Ragan, Mark A., Chan, Cheong Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247013/
https://www.ncbi.nlm.nih.gov/pubmed/30505941
http://dx.doi.org/10.1128/mSystems.00257-18
_version_ 1783372425947250688
author Bernard, Guillaume
Greenfield, Paul
Ragan, Mark A.
Chan, Cheong Xin
author_facet Bernard, Guillaume
Greenfield, Paul
Ragan, Mark A.
Chan, Cheong Xin
author_sort Bernard, Guillaume
collection PubMed
description Microbial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. These processes can be distinguished by alignment-based inference and comparison of phylogenetic trees for individual gene families, but this approach is not scalable to whole-genome sequences, and a tree-like structure does not adequately capture how these processes impact microbial physiology. Here we adopted alignment-free approaches based on k-mer statistics to infer phylogenomic networks involving 2,783 completely sequenced bacterial and archaeal genomes and compared the contributions of rRNA, protein-coding, and plasmid sequences to these networks. Our results show that the phylogenomic signal arising from ribosomal RNAs is strong and extends broadly across all taxa, whereas that from plasmids is strong but restricted to closely related groups, particularly Proteobacteria. However, the signal from the other chromosomal regions is restricted in breadth. We show that mean k-mer similarity can correlate with taxonomic rank. We also link the implicated k-mers to genome annotation (thus, functions) and define core k-mers (thus, core functions) in specific phyletic groups. Highly conserved functions in most phyla include amino acid metabolism and transport as well as energy production and conversion. Intracellular trafficking and secretion are the most prominent core functions among Spirochaetes, whereas energy production and conversion are not highly conserved among the largely parasitic or commensal Tenericutes. These observations suggest that differential conservation of functions relates to niche specialization and evolutionary diversification of microbes. Our results demonstrate that k-mer approaches can be used to efficiently identify phylogenomic signals and conserved core functions at the multigenome scale. IMPORTANCE Genome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularly Proteobacteria. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure.
format Online
Article
Text
id pubmed-6247013
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-62470132018-11-30 k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank Bernard, Guillaume Greenfield, Paul Ragan, Mark A. Chan, Cheong Xin mSystems Research Article Microbial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. These processes can be distinguished by alignment-based inference and comparison of phylogenetic trees for individual gene families, but this approach is not scalable to whole-genome sequences, and a tree-like structure does not adequately capture how these processes impact microbial physiology. Here we adopted alignment-free approaches based on k-mer statistics to infer phylogenomic networks involving 2,783 completely sequenced bacterial and archaeal genomes and compared the contributions of rRNA, protein-coding, and plasmid sequences to these networks. Our results show that the phylogenomic signal arising from ribosomal RNAs is strong and extends broadly across all taxa, whereas that from plasmids is strong but restricted to closely related groups, particularly Proteobacteria. However, the signal from the other chromosomal regions is restricted in breadth. We show that mean k-mer similarity can correlate with taxonomic rank. We also link the implicated k-mers to genome annotation (thus, functions) and define core k-mers (thus, core functions) in specific phyletic groups. Highly conserved functions in most phyla include amino acid metabolism and transport as well as energy production and conversion. Intracellular trafficking and secretion are the most prominent core functions among Spirochaetes, whereas energy production and conversion are not highly conserved among the largely parasitic or commensal Tenericutes. These observations suggest that differential conservation of functions relates to niche specialization and evolutionary diversification of microbes. Our results demonstrate that k-mer approaches can be used to efficiently identify phylogenomic signals and conserved core functions at the multigenome scale. IMPORTANCE Genome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularly Proteobacteria. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure. American Society for Microbiology 2018-11-20 /pmc/articles/PMC6247013/ /pubmed/30505941 http://dx.doi.org/10.1128/mSystems.00257-18 Text en Copyright © 2018 Bernard et al. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Bernard, Guillaume
Greenfield, Paul
Ragan, Mark A.
Chan, Cheong Xin
k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
title k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
title_full k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
title_fullStr k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
title_full_unstemmed k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
title_short k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
title_sort k-mer similarity, networks of microbial genomes, and taxonomic rank
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247013/
https://www.ncbi.nlm.nih.gov/pubmed/30505941
http://dx.doi.org/10.1128/mSystems.00257-18
work_keys_str_mv AT bernardguillaume kmersimilaritynetworksofmicrobialgenomesandtaxonomicrank
AT greenfieldpaul kmersimilaritynetworksofmicrobialgenomesandtaxonomicrank
AT raganmarka kmersimilaritynetworksofmicrobialgenomesandtaxonomicrank
AT chancheongxin kmersimilaritynetworksofmicrobialgenomesandtaxonomicrank