Cargando…
Finding functional associations between prokaryotic virus orthologous groups: a proof of concept
BACKGROUND: The field of viromics has greatly benefited from recent developments in metagenomics, with significant efforts focusing on viral discovery. However, functional annotation of the increasing number of viral genomes is lagging behind. This is highlighted by the degree of annotation of the p...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8442406/ https://www.ncbi.nlm.nih.gov/pubmed/34525942 http://dx.doi.org/10.1186/s12859-021-04343-w |
_version_ | 1783753001430679552 |
---|---|
author | Pappas, Nikolaos Dutilh, Bas E. |
author_facet | Pappas, Nikolaos Dutilh, Bas E. |
author_sort | Pappas, Nikolaos |
collection | PubMed |
description | BACKGROUND: The field of viromics has greatly benefited from recent developments in metagenomics, with significant efforts focusing on viral discovery. However, functional annotation of the increasing number of viral genomes is lagging behind. This is highlighted by the degree of annotation of the protein clusters in the prokaryotic Virus Orthologous Groups (pVOGs) database, with 83% of its current 9518 pVOGs having an unknown function. RESULTS: In this study we describe a machine learning approach to explore potential functional associations between pVOGs. We measure seven genomic features and use them as input to a Random Forest classifier to predict protein–protein interactions between pairs of pVOGs. After systematic evaluation of the model’s performance on 10 different datasets, we obtained a predictor with a mean accuracy of 0.77 and Area Under Receiving Operation Characteristic (AUROC) score of 0.83. Its application to a set of 2,133,027 pVOG-pVOG interactions allowed us to predict 267,265 putative interactions with a reported probability greater than 0.65. At an expected false discovery rate of 0.27, we placed 95.6% of the previously unannotated pVOGs in a functional context, by predicting their interaction with a pVOG that is functionally annotated. CONCLUSIONS: We believe that this proof-of-concept methodology, wrapped in a reproducible and automated workflow, can represent a significant step towards obtaining a more complete picture of bacteriophage biology. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04343-w. |
format | Online Article Text |
id | pubmed-8442406 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-84424062021-09-15 Finding functional associations between prokaryotic virus orthologous groups: a proof of concept Pappas, Nikolaos Dutilh, Bas E. BMC Bioinformatics Methodology Article BACKGROUND: The field of viromics has greatly benefited from recent developments in metagenomics, with significant efforts focusing on viral discovery. However, functional annotation of the increasing number of viral genomes is lagging behind. This is highlighted by the degree of annotation of the protein clusters in the prokaryotic Virus Orthologous Groups (pVOGs) database, with 83% of its current 9518 pVOGs having an unknown function. RESULTS: In this study we describe a machine learning approach to explore potential functional associations between pVOGs. We measure seven genomic features and use them as input to a Random Forest classifier to predict protein–protein interactions between pairs of pVOGs. After systematic evaluation of the model’s performance on 10 different datasets, we obtained a predictor with a mean accuracy of 0.77 and Area Under Receiving Operation Characteristic (AUROC) score of 0.83. Its application to a set of 2,133,027 pVOG-pVOG interactions allowed us to predict 267,265 putative interactions with a reported probability greater than 0.65. At an expected false discovery rate of 0.27, we placed 95.6% of the previously unannotated pVOGs in a functional context, by predicting their interaction with a pVOG that is functionally annotated. CONCLUSIONS: We believe that this proof-of-concept methodology, wrapped in a reproducible and automated workflow, can represent a significant step towards obtaining a more complete picture of bacteriophage biology. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04343-w. BioMed Central 2021-09-15 /pmc/articles/PMC8442406/ /pubmed/34525942 http://dx.doi.org/10.1186/s12859-021-04343-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Pappas, Nikolaos Dutilh, Bas E. Finding functional associations between prokaryotic virus orthologous groups: a proof of concept |
title | Finding functional associations between prokaryotic virus orthologous groups: a proof of concept |
title_full | Finding functional associations between prokaryotic virus orthologous groups: a proof of concept |
title_fullStr | Finding functional associations between prokaryotic virus orthologous groups: a proof of concept |
title_full_unstemmed | Finding functional associations between prokaryotic virus orthologous groups: a proof of concept |
title_short | Finding functional associations between prokaryotic virus orthologous groups: a proof of concept |
title_sort | finding functional associations between prokaryotic virus orthologous groups: a proof of concept |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8442406/ https://www.ncbi.nlm.nih.gov/pubmed/34525942 http://dx.doi.org/10.1186/s12859-021-04343-w |
work_keys_str_mv | AT pappasnikolaos findingfunctionalassociationsbetweenprokaryoticvirusorthologousgroupsaproofofconcept AT dutilhbase findingfunctionalassociationsbetweenprokaryoticvirusorthologousgroupsaproofofconcept |