Cargando…

Informative Regions In Viral Genomes

Viruses, far from being just parasites affecting hosts’ fitness, are major players in any microbial ecosystem. In spite of their broad abundance, viruses, in particular bacteriophages, remain largely unknown since only about 20% of sequences obtained from viral community DNA surveys could be annotat...

Descripción completa

Detalles Bibliográficos
Autores principales: Moreno-Gallego, Jaime Leonardo, Reyes, Alejandro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8234400/
https://www.ncbi.nlm.nih.gov/pubmed/34207030
http://dx.doi.org/10.3390/v13061164
_version_ 1783714075246592000
author Moreno-Gallego, Jaime Leonardo
Reyes, Alejandro
author_facet Moreno-Gallego, Jaime Leonardo
Reyes, Alejandro
author_sort Moreno-Gallego, Jaime Leonardo
collection PubMed
description Viruses, far from being just parasites affecting hosts’ fitness, are major players in any microbial ecosystem. In spite of their broad abundance, viruses, in particular bacteriophages, remain largely unknown since only about 20% of sequences obtained from viral community DNA surveys could be annotated by comparison with public databases. In order to shed some light into this genetic dark matter we expanded the search of orthologous groups as potential markers to viral taxonomy from bacteriophages and included eukaryotic viruses, establishing a set of 31,150 ViPhOGs (Eukaryotic Viruses and Phages Orthologous Groups). To do this, we examine the non-redundant viral diversity stored in public databases, predict proteins in genomes lacking such information, and used all annotated and predicted proteins to identify potential protein domains. The clustering of domains and unannotated regions into orthologous groups was done using cogSoft. Finally, we employed a random forest implementation to classify genomes into their taxonomy and found that the presence or absence of ViPhOGs is significantly associated with their taxonomy. Furthermore, we established a set of 1457 ViPhOGs that given their importance for the classification could be considered as markers or signatures for the different taxonomic groups defined by the ICTV at the order, family, and genus levels.
format Online
Article
Text
id pubmed-8234400
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-82344002021-06-27 Informative Regions In Viral Genomes Moreno-Gallego, Jaime Leonardo Reyes, Alejandro Viruses Article Viruses, far from being just parasites affecting hosts’ fitness, are major players in any microbial ecosystem. In spite of their broad abundance, viruses, in particular bacteriophages, remain largely unknown since only about 20% of sequences obtained from viral community DNA surveys could be annotated by comparison with public databases. In order to shed some light into this genetic dark matter we expanded the search of orthologous groups as potential markers to viral taxonomy from bacteriophages and included eukaryotic viruses, establishing a set of 31,150 ViPhOGs (Eukaryotic Viruses and Phages Orthologous Groups). To do this, we examine the non-redundant viral diversity stored in public databases, predict proteins in genomes lacking such information, and used all annotated and predicted proteins to identify potential protein domains. The clustering of domains and unannotated regions into orthologous groups was done using cogSoft. Finally, we employed a random forest implementation to classify genomes into their taxonomy and found that the presence or absence of ViPhOGs is significantly associated with their taxonomy. Furthermore, we established a set of 1457 ViPhOGs that given their importance for the classification could be considered as markers or signatures for the different taxonomic groups defined by the ICTV at the order, family, and genus levels. MDPI 2021-06-18 /pmc/articles/PMC8234400/ /pubmed/34207030 http://dx.doi.org/10.3390/v13061164 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Moreno-Gallego, Jaime Leonardo
Reyes, Alejandro
Informative Regions In Viral Genomes
title Informative Regions In Viral Genomes
title_full Informative Regions In Viral Genomes
title_fullStr Informative Regions In Viral Genomes
title_full_unstemmed Informative Regions In Viral Genomes
title_short Informative Regions In Viral Genomes
title_sort informative regions in viral genomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8234400/
https://www.ncbi.nlm.nih.gov/pubmed/34207030
http://dx.doi.org/10.3390/v13061164
work_keys_str_mv AT morenogallegojaimeleonardo informativeregionsinviralgenomes
AT reyesalejandro informativeregionsinviralgenomes