Cargando…

PHROG: families of prokaryotic virus proteins clustered using remote homology

Viruses are abundant, diverse and ancestral biological entities. Their diversity is high, both in terms of the number of different protein families encountered and in the sequence heterogeneity of each protein family. The recent increase in sequenced viral genomes constitutes a great opportunity to...

Descripción completa

Detalles Bibliográficos
Autores principales: Terzian, Paul, Olo Ndela, Eric, Galiez, Clovis, Lossouarn, Julien, Pérez Bucio, Rubén Enrique, Mom, Robin, Toussaint, Ariane, Petit, Marie-Agnès, Enault, François
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8341000/
https://www.ncbi.nlm.nih.gov/pubmed/34377978
http://dx.doi.org/10.1093/nargab/lqab067
_version_ 1783733859475521536
author Terzian, Paul
Olo Ndela, Eric
Galiez, Clovis
Lossouarn, Julien
Pérez Bucio, Rubén Enrique
Mom, Robin
Toussaint, Ariane
Petit, Marie-Agnès
Enault, François
author_facet Terzian, Paul
Olo Ndela, Eric
Galiez, Clovis
Lossouarn, Julien
Pérez Bucio, Rubén Enrique
Mom, Robin
Toussaint, Ariane
Petit, Marie-Agnès
Enault, François
author_sort Terzian, Paul
collection PubMed
description Viruses are abundant, diverse and ancestral biological entities. Their diversity is high, both in terms of the number of different protein families encountered and in the sequence heterogeneity of each protein family. The recent increase in sequenced viral genomes constitutes a great opportunity to gain new insights into this diversity and consequently urges the development of annotation resources to help functional and comparative analysis. Here, we introduce PHROG (Prokaryotic Virus Remote Homologous Groups), a library of viral protein families generated using a new clustering approach based on remote homology detection by HMM profile-profile comparisons. Considering 17 473 reference (pro)viruses of prokaryotes, 868 340 of the total 938 864 proteins were grouped into 38 880 clusters that proved to be a 2-fold deeper clustering than using a classical strategy based on BLAST-like similarity searches, and yet to remain homogeneous. Manual inspection of similarities to various reference sequence databases led to the annotation of 5108 clusters (containing 50.6 % of the total protein dataset) with 705 different annotation terms, included in 9 functional categories, specifically designed for viruses. Hopefully, PHROG will be a useful tool to better annotate future prokaryotic viral sequences thus helping the scientific community to better understand the evolution and ecology of these entities.
format Online
Article
Text
id pubmed-8341000
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83410002021-08-09 PHROG: families of prokaryotic virus proteins clustered using remote homology Terzian, Paul Olo Ndela, Eric Galiez, Clovis Lossouarn, Julien Pérez Bucio, Rubén Enrique Mom, Robin Toussaint, Ariane Petit, Marie-Agnès Enault, François NAR Genom Bioinform Standard Article Viruses are abundant, diverse and ancestral biological entities. Their diversity is high, both in terms of the number of different protein families encountered and in the sequence heterogeneity of each protein family. The recent increase in sequenced viral genomes constitutes a great opportunity to gain new insights into this diversity and consequently urges the development of annotation resources to help functional and comparative analysis. Here, we introduce PHROG (Prokaryotic Virus Remote Homologous Groups), a library of viral protein families generated using a new clustering approach based on remote homology detection by HMM profile-profile comparisons. Considering 17 473 reference (pro)viruses of prokaryotes, 868 340 of the total 938 864 proteins were grouped into 38 880 clusters that proved to be a 2-fold deeper clustering than using a classical strategy based on BLAST-like similarity searches, and yet to remain homogeneous. Manual inspection of similarities to various reference sequence databases led to the annotation of 5108 clusters (containing 50.6 % of the total protein dataset) with 705 different annotation terms, included in 9 functional categories, specifically designed for viruses. Hopefully, PHROG will be a useful tool to better annotate future prokaryotic viral sequences thus helping the scientific community to better understand the evolution and ecology of these entities. Oxford University Press 2021-08-05 /pmc/articles/PMC8341000/ /pubmed/34377978 http://dx.doi.org/10.1093/nargab/lqab067 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Terzian, Paul
Olo Ndela, Eric
Galiez, Clovis
Lossouarn, Julien
Pérez Bucio, Rubén Enrique
Mom, Robin
Toussaint, Ariane
Petit, Marie-Agnès
Enault, François
PHROG: families of prokaryotic virus proteins clustered using remote homology
title PHROG: families of prokaryotic virus proteins clustered using remote homology
title_full PHROG: families of prokaryotic virus proteins clustered using remote homology
title_fullStr PHROG: families of prokaryotic virus proteins clustered using remote homology
title_full_unstemmed PHROG: families of prokaryotic virus proteins clustered using remote homology
title_short PHROG: families of prokaryotic virus proteins clustered using remote homology
title_sort phrog: families of prokaryotic virus proteins clustered using remote homology
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8341000/
https://www.ncbi.nlm.nih.gov/pubmed/34377978
http://dx.doi.org/10.1093/nargab/lqab067
work_keys_str_mv AT terzianpaul phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology
AT olondelaeric phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology
AT galiezclovis phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology
AT lossouarnjulien phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology
AT perezbuciorubenenrique phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology
AT momrobin phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology
AT toussaintariane phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology
AT petitmarieagnes phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology
AT enaultfrancois phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology