Cargando…

PhamClust: a phage genome clustering tool using proteomic equivalence

Bacteriophage comparative genomics is complex due to the mosaic nature of the genomes, and an underlying continuum of diversity confounds the identification of clear taxonomic boundaries. Nucleotide sequence comparison methods have been described for phage taxonomy, but they are computationally inte...

Descripción completa

Detalles Bibliográficos
Autores principales: Gauthier, Christian H., Hatfull, Graham F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654103/
https://www.ncbi.nlm.nih.gov/pubmed/37791778
http://dx.doi.org/10.1128/msystems.00443-23
_version_ 1785136558795390976
author Gauthier, Christian H.
Hatfull, Graham F.
author_facet Gauthier, Christian H.
Hatfull, Graham F.
author_sort Gauthier, Christian H.
collection PubMed
description Bacteriophage comparative genomics is complex due to the mosaic nature of the genomes, and an underlying continuum of diversity confounds the identification of clear taxonomic boundaries. Nucleotide sequence comparison methods have been described for phage taxonomy, but they are computationally intensive and scale poorly as the number of sequenced phage genomes increases. Here, we describe PhamClust as a new bioinformatic approach for grouping phages according to their inter-genome relatedness. PhamClust calculates a proteomic equivalence quotient (PEQ) for each pair of phages based on amino acid sequence identity for those genes that are shared among phages. PEQ values span from 0% (no shared genes) to 100% (all genes shared at 100% identity), and using a large mycobacteriophage genome data set, we show that two-step clustering down to a PEQ of 25% constructs genome groupings (clusters) closely mirroring those constructed manually over time, with the differences attributable to historically arising incongruities rather than illogicalities in PhamClust. PEQ values can also faithfully divide clusters into subclusters, although the relationships are highly heterogeneous, with different PEQ values needed for the subdivision of different clusters. PhamClust can be used to assort any group of phages, including the RefSeq phage collection. IMPORTANCE: Bacteriophage genomes are pervasively mosaic, presenting challenges to describing phage relatedness. Here, we describe PhamClust, a bioinformatic approach for phage genome comparisons that uses a new metric of proteomic equivalence quotient for comparative genomics. PhamClust reliably assorts genomes into groups or clusters of related phages and can subdivide clusters into subclusters. PhamClust is computationally efficient and can readily process thousands of phage genomes. It is also a useful analytic tool for exploring the different types of inter-genome relatedness characteristic of phages in different clusters.
format Online
Article
Text
id pubmed-10654103
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-106541032023-10-04 PhamClust: a phage genome clustering tool using proteomic equivalence Gauthier, Christian H. Hatfull, Graham F. mSystems Methods and Protocols Bacteriophage comparative genomics is complex due to the mosaic nature of the genomes, and an underlying continuum of diversity confounds the identification of clear taxonomic boundaries. Nucleotide sequence comparison methods have been described for phage taxonomy, but they are computationally intensive and scale poorly as the number of sequenced phage genomes increases. Here, we describe PhamClust as a new bioinformatic approach for grouping phages according to their inter-genome relatedness. PhamClust calculates a proteomic equivalence quotient (PEQ) for each pair of phages based on amino acid sequence identity for those genes that are shared among phages. PEQ values span from 0% (no shared genes) to 100% (all genes shared at 100% identity), and using a large mycobacteriophage genome data set, we show that two-step clustering down to a PEQ of 25% constructs genome groupings (clusters) closely mirroring those constructed manually over time, with the differences attributable to historically arising incongruities rather than illogicalities in PhamClust. PEQ values can also faithfully divide clusters into subclusters, although the relationships are highly heterogeneous, with different PEQ values needed for the subdivision of different clusters. PhamClust can be used to assort any group of phages, including the RefSeq phage collection. IMPORTANCE: Bacteriophage genomes are pervasively mosaic, presenting challenges to describing phage relatedness. Here, we describe PhamClust, a bioinformatic approach for phage genome comparisons that uses a new metric of proteomic equivalence quotient for comparative genomics. PhamClust reliably assorts genomes into groups or clusters of related phages and can subdivide clusters into subclusters. PhamClust is computationally efficient and can readily process thousands of phage genomes. It is also a useful analytic tool for exploring the different types of inter-genome relatedness characteristic of phages in different clusters. American Society for Microbiology 2023-10-04 /pmc/articles/PMC10654103/ /pubmed/37791778 http://dx.doi.org/10.1128/msystems.00443-23 Text en Copyright © 2023 Gauthier and Hatfull. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methods and Protocols
Gauthier, Christian H.
Hatfull, Graham F.
PhamClust: a phage genome clustering tool using proteomic equivalence
title PhamClust: a phage genome clustering tool using proteomic equivalence
title_full PhamClust: a phage genome clustering tool using proteomic equivalence
title_fullStr PhamClust: a phage genome clustering tool using proteomic equivalence
title_full_unstemmed PhamClust: a phage genome clustering tool using proteomic equivalence
title_short PhamClust: a phage genome clustering tool using proteomic equivalence
title_sort phamclust: a phage genome clustering tool using proteomic equivalence
topic Methods and Protocols
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654103/
https://www.ncbi.nlm.nih.gov/pubmed/37791778
http://dx.doi.org/10.1128/msystems.00443-23
work_keys_str_mv AT gauthierchristianh phamclustaphagegenomeclusteringtoolusingproteomicequivalence
AT hatfullgrahamf phamclustaphagegenomeclusteringtoolusingproteomicequivalence