Cargando…

Finding functional associations between prokaryotic virus orthologous groups: a proof of concept

BACKGROUND: The field of viromics has greatly benefited from recent developments in metagenomics, with significant efforts focusing on viral discovery. However, functional annotation of the increasing number of viral genomes is lagging behind. This is highlighted by the degree of annotation of the p...

Descripción completa

Detalles Bibliográficos
Autores principales: Pappas, Nikolaos, Dutilh, Bas E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8442406/
https://www.ncbi.nlm.nih.gov/pubmed/34525942
http://dx.doi.org/10.1186/s12859-021-04343-w
_version_ 1783753001430679552
author Pappas, Nikolaos
Dutilh, Bas E.
author_facet Pappas, Nikolaos
Dutilh, Bas E.
author_sort Pappas, Nikolaos
collection PubMed
description BACKGROUND: The field of viromics has greatly benefited from recent developments in metagenomics, with significant efforts focusing on viral discovery. However, functional annotation of the increasing number of viral genomes is lagging behind. This is highlighted by the degree of annotation of the protein clusters in the prokaryotic Virus Orthologous Groups (pVOGs) database, with 83% of its current 9518 pVOGs having an unknown function. RESULTS: In this study we describe a machine learning approach to explore potential functional associations between pVOGs. We measure seven genomic features and use them as input to a Random Forest classifier to predict protein–protein interactions between pairs of pVOGs. After systematic evaluation of the model’s performance on 10 different datasets, we obtained a predictor with a mean accuracy of 0.77 and Area Under Receiving Operation Characteristic (AUROC) score of 0.83. Its application to a set of 2,133,027 pVOG-pVOG interactions allowed us to predict 267,265 putative interactions with a reported probability greater than 0.65. At an expected false discovery rate of 0.27, we placed 95.6% of the previously unannotated pVOGs in a functional context, by predicting their interaction with a pVOG that is functionally annotated. CONCLUSIONS: We believe that this proof-of-concept methodology, wrapped in a reproducible and automated workflow, can represent a significant step towards obtaining a more complete picture of bacteriophage biology. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04343-w.
format Online
Article
Text
id pubmed-8442406
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84424062021-09-15 Finding functional associations between prokaryotic virus orthologous groups: a proof of concept Pappas, Nikolaos Dutilh, Bas E. BMC Bioinformatics Methodology Article BACKGROUND: The field of viromics has greatly benefited from recent developments in metagenomics, with significant efforts focusing on viral discovery. However, functional annotation of the increasing number of viral genomes is lagging behind. This is highlighted by the degree of annotation of the protein clusters in the prokaryotic Virus Orthologous Groups (pVOGs) database, with 83% of its current 9518 pVOGs having an unknown function. RESULTS: In this study we describe a machine learning approach to explore potential functional associations between pVOGs. We measure seven genomic features and use them as input to a Random Forest classifier to predict protein–protein interactions between pairs of pVOGs. After systematic evaluation of the model’s performance on 10 different datasets, we obtained a predictor with a mean accuracy of 0.77 and Area Under Receiving Operation Characteristic (AUROC) score of 0.83. Its application to a set of 2,133,027 pVOG-pVOG interactions allowed us to predict 267,265 putative interactions with a reported probability greater than 0.65. At an expected false discovery rate of 0.27, we placed 95.6% of the previously unannotated pVOGs in a functional context, by predicting their interaction with a pVOG that is functionally annotated. CONCLUSIONS: We believe that this proof-of-concept methodology, wrapped in a reproducible and automated workflow, can represent a significant step towards obtaining a more complete picture of bacteriophage biology. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04343-w. BioMed Central 2021-09-15 /pmc/articles/PMC8442406/ /pubmed/34525942 http://dx.doi.org/10.1186/s12859-021-04343-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Pappas, Nikolaos
Dutilh, Bas E.
Finding functional associations between prokaryotic virus orthologous groups: a proof of concept
title Finding functional associations between prokaryotic virus orthologous groups: a proof of concept
title_full Finding functional associations between prokaryotic virus orthologous groups: a proof of concept
title_fullStr Finding functional associations between prokaryotic virus orthologous groups: a proof of concept
title_full_unstemmed Finding functional associations between prokaryotic virus orthologous groups: a proof of concept
title_short Finding functional associations between prokaryotic virus orthologous groups: a proof of concept
title_sort finding functional associations between prokaryotic virus orthologous groups: a proof of concept
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8442406/
https://www.ncbi.nlm.nih.gov/pubmed/34525942
http://dx.doi.org/10.1186/s12859-021-04343-w
work_keys_str_mv AT pappasnikolaos findingfunctionalassociationsbetweenprokaryoticvirusorthologousgroupsaproofofconcept
AT dutilhbase findingfunctionalassociationsbetweenprokaryoticvirusorthologousgroupsaproofofconcept