Cargando…

RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content

Culture-independent approaches have recently shed light on the genomic diversity of viruses of prokaryotes. One fundamental question when trying to understand their ecological roles is: which host do they infect? To tackle this issue we developed a machine-learning approach named Random Forest Assig...

Descripción completa

Detalles Bibliográficos
Autores principales: Coutinho, Felipe Hernandes, Zaragoza-Solas, Asier, López-Pérez, Mario, Barylski, Jakub, Zielezinski, Andrzej, Dutilh, Bas E., Edwards, Robert, Rodriguez-Valera, Francisco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276007/
https://www.ncbi.nlm.nih.gov/pubmed/34286299
http://dx.doi.org/10.1016/j.patter.2021.100274
_version_ 1783721827930996736
author Coutinho, Felipe Hernandes
Zaragoza-Solas, Asier
López-Pérez, Mario
Barylski, Jakub
Zielezinski, Andrzej
Dutilh, Bas E.
Edwards, Robert
Rodriguez-Valera, Francisco
author_facet Coutinho, Felipe Hernandes
Zaragoza-Solas, Asier
López-Pérez, Mario
Barylski, Jakub
Zielezinski, Andrzej
Dutilh, Bas E.
Edwards, Robert
Rodriguez-Valera, Francisco
author_sort Coutinho, Felipe Hernandes
collection PubMed
description Culture-independent approaches have recently shed light on the genomic diversity of viruses of prokaryotes. One fundamental question when trying to understand their ecological roles is: which host do they infect? To tackle this issue we developed a machine-learning approach named Random Forest Assignment of Hosts (RaFAH), that uses scores to 43,644 protein clusters to assign hosts to complete or fragmented genomes of viruses of Archaea and Bacteria. RaFAH displayed performance comparable with that of other methods for virus-host prediction in three different benchmarks encompassing viruses from RefSeq, single amplified genomes, and metagenomes. RaFAH was applied to assembled metagenomic datasets of uncultured viruses from eight different biomes of medical, biotechnological, and environmental relevance. Our analyses led to the identification of 537 sequences of archaeal viruses representing unknown lineages, whose genomes encode novel auxiliary metabolic genes, shedding light on how these viruses interfere with the host molecular machinery. RaFAH is available at https://sourceforge.net/projects/rafah/.
format Online
Article
Text
id pubmed-8276007
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-82760072021-07-19 RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content Coutinho, Felipe Hernandes Zaragoza-Solas, Asier López-Pérez, Mario Barylski, Jakub Zielezinski, Andrzej Dutilh, Bas E. Edwards, Robert Rodriguez-Valera, Francisco Patterns (N Y) Descriptor Culture-independent approaches have recently shed light on the genomic diversity of viruses of prokaryotes. One fundamental question when trying to understand their ecological roles is: which host do they infect? To tackle this issue we developed a machine-learning approach named Random Forest Assignment of Hosts (RaFAH), that uses scores to 43,644 protein clusters to assign hosts to complete or fragmented genomes of viruses of Archaea and Bacteria. RaFAH displayed performance comparable with that of other methods for virus-host prediction in three different benchmarks encompassing viruses from RefSeq, single amplified genomes, and metagenomes. RaFAH was applied to assembled metagenomic datasets of uncultured viruses from eight different biomes of medical, biotechnological, and environmental relevance. Our analyses led to the identification of 537 sequences of archaeal viruses representing unknown lineages, whose genomes encode novel auxiliary metabolic genes, shedding light on how these viruses interfere with the host molecular machinery. RaFAH is available at https://sourceforge.net/projects/rafah/. Elsevier 2021-06-15 /pmc/articles/PMC8276007/ /pubmed/34286299 http://dx.doi.org/10.1016/j.patter.2021.100274 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Descriptor
Coutinho, Felipe Hernandes
Zaragoza-Solas, Asier
López-Pérez, Mario
Barylski, Jakub
Zielezinski, Andrzej
Dutilh, Bas E.
Edwards, Robert
Rodriguez-Valera, Francisco
RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content
title RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content
title_full RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content
title_fullStr RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content
title_full_unstemmed RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content
title_short RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content
title_sort rafah: host prediction for viruses of bacteria and archaea based on protein content
topic Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276007/
https://www.ncbi.nlm.nih.gov/pubmed/34286299
http://dx.doi.org/10.1016/j.patter.2021.100274
work_keys_str_mv AT coutinhofelipehernandes rafahhostpredictionforvirusesofbacteriaandarchaeabasedonproteincontent
AT zaragozasolasasier rafahhostpredictionforvirusesofbacteriaandarchaeabasedonproteincontent
AT lopezperezmario rafahhostpredictionforvirusesofbacteriaandarchaeabasedonproteincontent
AT barylskijakub rafahhostpredictionforvirusesofbacteriaandarchaeabasedonproteincontent
AT zielezinskiandrzej rafahhostpredictionforvirusesofbacteriaandarchaeabasedonproteincontent
AT dutilhbase rafahhostpredictionforvirusesofbacteriaandarchaeabasedonproteincontent
AT edwardsrobert rafahhostpredictionforvirusesofbacteriaandarchaeabasedonproteincontent
AT rodriguezvalerafrancisco rafahhostpredictionforvirusesofbacteriaandarchaeabasedonproteincontent