Cargando…
PHROG: families of prokaryotic virus proteins clustered using remote homology
Viruses are abundant, diverse and ancestral biological entities. Their diversity is high, both in terms of the number of different protein families encountered and in the sequence heterogeneity of each protein family. The recent increase in sequenced viral genomes constitutes a great opportunity to...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8341000/ https://www.ncbi.nlm.nih.gov/pubmed/34377978 http://dx.doi.org/10.1093/nargab/lqab067 |
_version_ | 1783733859475521536 |
---|---|
author | Terzian, Paul Olo Ndela, Eric Galiez, Clovis Lossouarn, Julien Pérez Bucio, Rubén Enrique Mom, Robin Toussaint, Ariane Petit, Marie-Agnès Enault, François |
author_facet | Terzian, Paul Olo Ndela, Eric Galiez, Clovis Lossouarn, Julien Pérez Bucio, Rubén Enrique Mom, Robin Toussaint, Ariane Petit, Marie-Agnès Enault, François |
author_sort | Terzian, Paul |
collection | PubMed |
description | Viruses are abundant, diverse and ancestral biological entities. Their diversity is high, both in terms of the number of different protein families encountered and in the sequence heterogeneity of each protein family. The recent increase in sequenced viral genomes constitutes a great opportunity to gain new insights into this diversity and consequently urges the development of annotation resources to help functional and comparative analysis. Here, we introduce PHROG (Prokaryotic Virus Remote Homologous Groups), a library of viral protein families generated using a new clustering approach based on remote homology detection by HMM profile-profile comparisons. Considering 17 473 reference (pro)viruses of prokaryotes, 868 340 of the total 938 864 proteins were grouped into 38 880 clusters that proved to be a 2-fold deeper clustering than using a classical strategy based on BLAST-like similarity searches, and yet to remain homogeneous. Manual inspection of similarities to various reference sequence databases led to the annotation of 5108 clusters (containing 50.6 % of the total protein dataset) with 705 different annotation terms, included in 9 functional categories, specifically designed for viruses. Hopefully, PHROG will be a useful tool to better annotate future prokaryotic viral sequences thus helping the scientific community to better understand the evolution and ecology of these entities. |
format | Online Article Text |
id | pubmed-8341000 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-83410002021-08-09 PHROG: families of prokaryotic virus proteins clustered using remote homology Terzian, Paul Olo Ndela, Eric Galiez, Clovis Lossouarn, Julien Pérez Bucio, Rubén Enrique Mom, Robin Toussaint, Ariane Petit, Marie-Agnès Enault, François NAR Genom Bioinform Standard Article Viruses are abundant, diverse and ancestral biological entities. Their diversity is high, both in terms of the number of different protein families encountered and in the sequence heterogeneity of each protein family. The recent increase in sequenced viral genomes constitutes a great opportunity to gain new insights into this diversity and consequently urges the development of annotation resources to help functional and comparative analysis. Here, we introduce PHROG (Prokaryotic Virus Remote Homologous Groups), a library of viral protein families generated using a new clustering approach based on remote homology detection by HMM profile-profile comparisons. Considering 17 473 reference (pro)viruses of prokaryotes, 868 340 of the total 938 864 proteins were grouped into 38 880 clusters that proved to be a 2-fold deeper clustering than using a classical strategy based on BLAST-like similarity searches, and yet to remain homogeneous. Manual inspection of similarities to various reference sequence databases led to the annotation of 5108 clusters (containing 50.6 % of the total protein dataset) with 705 different annotation terms, included in 9 functional categories, specifically designed for viruses. Hopefully, PHROG will be a useful tool to better annotate future prokaryotic viral sequences thus helping the scientific community to better understand the evolution and ecology of these entities. Oxford University Press 2021-08-05 /pmc/articles/PMC8341000/ /pubmed/34377978 http://dx.doi.org/10.1093/nargab/lqab067 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Standard Article Terzian, Paul Olo Ndela, Eric Galiez, Clovis Lossouarn, Julien Pérez Bucio, Rubén Enrique Mom, Robin Toussaint, Ariane Petit, Marie-Agnès Enault, François PHROG: families of prokaryotic virus proteins clustered using remote homology |
title | PHROG: families of prokaryotic virus proteins clustered using remote homology |
title_full | PHROG: families of prokaryotic virus proteins clustered using remote homology |
title_fullStr | PHROG: families of prokaryotic virus proteins clustered using remote homology |
title_full_unstemmed | PHROG: families of prokaryotic virus proteins clustered using remote homology |
title_short | PHROG: families of prokaryotic virus proteins clustered using remote homology |
title_sort | phrog: families of prokaryotic virus proteins clustered using remote homology |
topic | Standard Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8341000/ https://www.ncbi.nlm.nih.gov/pubmed/34377978 http://dx.doi.org/10.1093/nargab/lqab067 |
work_keys_str_mv | AT terzianpaul phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology AT olondelaeric phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology AT galiezclovis phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology AT lossouarnjulien phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology AT perezbuciorubenenrique phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology AT momrobin phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology AT toussaintariane phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology AT petitmarieagnes phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology AT enaultfrancois phrogfamiliesofprokaryoticvirusproteinsclusteredusingremotehomology |