Cargando…
PhamClust: a phage genome clustering tool using proteomic equivalence
Bacteriophage comparative genomics is complex due to the mosaic nature of the genomes, and an underlying continuum of diversity confounds the identification of clear taxonomic boundaries. Nucleotide sequence comparison methods have been described for phage taxonomy, but they are computationally inte...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society for Microbiology
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654103/ https://www.ncbi.nlm.nih.gov/pubmed/37791778 http://dx.doi.org/10.1128/msystems.00443-23 |
_version_ | 1785136558795390976 |
---|---|
author | Gauthier, Christian H. Hatfull, Graham F. |
author_facet | Gauthier, Christian H. Hatfull, Graham F. |
author_sort | Gauthier, Christian H. |
collection | PubMed |
description | Bacteriophage comparative genomics is complex due to the mosaic nature of the genomes, and an underlying continuum of diversity confounds the identification of clear taxonomic boundaries. Nucleotide sequence comparison methods have been described for phage taxonomy, but they are computationally intensive and scale poorly as the number of sequenced phage genomes increases. Here, we describe PhamClust as a new bioinformatic approach for grouping phages according to their inter-genome relatedness. PhamClust calculates a proteomic equivalence quotient (PEQ) for each pair of phages based on amino acid sequence identity for those genes that are shared among phages. PEQ values span from 0% (no shared genes) to 100% (all genes shared at 100% identity), and using a large mycobacteriophage genome data set, we show that two-step clustering down to a PEQ of 25% constructs genome groupings (clusters) closely mirroring those constructed manually over time, with the differences attributable to historically arising incongruities rather than illogicalities in PhamClust. PEQ values can also faithfully divide clusters into subclusters, although the relationships are highly heterogeneous, with different PEQ values needed for the subdivision of different clusters. PhamClust can be used to assort any group of phages, including the RefSeq phage collection. IMPORTANCE: Bacteriophage genomes are pervasively mosaic, presenting challenges to describing phage relatedness. Here, we describe PhamClust, a bioinformatic approach for phage genome comparisons that uses a new metric of proteomic equivalence quotient for comparative genomics. PhamClust reliably assorts genomes into groups or clusters of related phages and can subdivide clusters into subclusters. PhamClust is computationally efficient and can readily process thousands of phage genomes. It is also a useful analytic tool for exploring the different types of inter-genome relatedness characteristic of phages in different clusters. |
format | Online Article Text |
id | pubmed-10654103 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Society for Microbiology |
record_format | MEDLINE/PubMed |
spelling | pubmed-106541032023-10-04 PhamClust: a phage genome clustering tool using proteomic equivalence Gauthier, Christian H. Hatfull, Graham F. mSystems Methods and Protocols Bacteriophage comparative genomics is complex due to the mosaic nature of the genomes, and an underlying continuum of diversity confounds the identification of clear taxonomic boundaries. Nucleotide sequence comparison methods have been described for phage taxonomy, but they are computationally intensive and scale poorly as the number of sequenced phage genomes increases. Here, we describe PhamClust as a new bioinformatic approach for grouping phages according to their inter-genome relatedness. PhamClust calculates a proteomic equivalence quotient (PEQ) for each pair of phages based on amino acid sequence identity for those genes that are shared among phages. PEQ values span from 0% (no shared genes) to 100% (all genes shared at 100% identity), and using a large mycobacteriophage genome data set, we show that two-step clustering down to a PEQ of 25% constructs genome groupings (clusters) closely mirroring those constructed manually over time, with the differences attributable to historically arising incongruities rather than illogicalities in PhamClust. PEQ values can also faithfully divide clusters into subclusters, although the relationships are highly heterogeneous, with different PEQ values needed for the subdivision of different clusters. PhamClust can be used to assort any group of phages, including the RefSeq phage collection. IMPORTANCE: Bacteriophage genomes are pervasively mosaic, presenting challenges to describing phage relatedness. Here, we describe PhamClust, a bioinformatic approach for phage genome comparisons that uses a new metric of proteomic equivalence quotient for comparative genomics. PhamClust reliably assorts genomes into groups or clusters of related phages and can subdivide clusters into subclusters. PhamClust is computationally efficient and can readily process thousands of phage genomes. It is also a useful analytic tool for exploring the different types of inter-genome relatedness characteristic of phages in different clusters. American Society for Microbiology 2023-10-04 /pmc/articles/PMC10654103/ /pubmed/37791778 http://dx.doi.org/10.1128/msystems.00443-23 Text en Copyright © 2023 Gauthier and Hatfull. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Methods and Protocols Gauthier, Christian H. Hatfull, Graham F. PhamClust: a phage genome clustering tool using proteomic equivalence |
title | PhamClust: a phage genome clustering tool using proteomic equivalence |
title_full | PhamClust: a phage genome clustering tool using proteomic equivalence |
title_fullStr | PhamClust: a phage genome clustering tool using proteomic equivalence |
title_full_unstemmed | PhamClust: a phage genome clustering tool using proteomic equivalence |
title_short | PhamClust: a phage genome clustering tool using proteomic equivalence |
title_sort | phamclust: a phage genome clustering tool using proteomic equivalence |
topic | Methods and Protocols |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654103/ https://www.ncbi.nlm.nih.gov/pubmed/37791778 http://dx.doi.org/10.1128/msystems.00443-23 |
work_keys_str_mv | AT gauthierchristianh phamclustaphagegenomeclusteringtoolusingproteomicequivalence AT hatfullgrahamf phamclustaphagegenomeclusteringtoolusingproteomicequivalence |