Cargando…

PhamClust: a phage genome clustering tool using proteomic equivalence

Bacteriophage comparative genomics is complex due to the mosaic nature of the genomes, and an underlying continuum of diversity confounds the identification of clear taxonomic boundaries. Nucleotide sequence comparison methods have been described for phage taxonomy, but they are computationally inte...

Descripción completa

Detalles Bibliográficos
Autores principales: Gauthier, Christian H., Hatfull, Graham F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654103/
https://www.ncbi.nlm.nih.gov/pubmed/37791778
http://dx.doi.org/10.1128/msystems.00443-23
Descripción
Sumario:Bacteriophage comparative genomics is complex due to the mosaic nature of the genomes, and an underlying continuum of diversity confounds the identification of clear taxonomic boundaries. Nucleotide sequence comparison methods have been described for phage taxonomy, but they are computationally intensive and scale poorly as the number of sequenced phage genomes increases. Here, we describe PhamClust as a new bioinformatic approach for grouping phages according to their inter-genome relatedness. PhamClust calculates a proteomic equivalence quotient (PEQ) for each pair of phages based on amino acid sequence identity for those genes that are shared among phages. PEQ values span from 0% (no shared genes) to 100% (all genes shared at 100% identity), and using a large mycobacteriophage genome data set, we show that two-step clustering down to a PEQ of 25% constructs genome groupings (clusters) closely mirroring those constructed manually over time, with the differences attributable to historically arising incongruities rather than illogicalities in PhamClust. PEQ values can also faithfully divide clusters into subclusters, although the relationships are highly heterogeneous, with different PEQ values needed for the subdivision of different clusters. PhamClust can be used to assort any group of phages, including the RefSeq phage collection. IMPORTANCE: Bacteriophage genomes are pervasively mosaic, presenting challenges to describing phage relatedness. Here, we describe PhamClust, a bioinformatic approach for phage genome comparisons that uses a new metric of proteomic equivalence quotient for comparative genomics. PhamClust reliably assorts genomes into groups or clusters of related phages and can subdivide clusters into subclusters. PhamClust is computationally efficient and can readily process thousands of phage genomes. It is also a useful analytic tool for exploring the different types of inter-genome relatedness characteristic of phages in different clusters.