Cargando…

MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets

Our knowledge of viral sequence space has exploded with advancing sequencing technologies and large-scale sampling and analytical efforts. Though archaea are important and abundant prokaryotes in many systems, our knowledge of archaeal viruses outside of extreme environments is limited. This largely...

Descripción completa

Detalles Bibliográficos
Autores principales: Vik, Dean, Bolduc, Benjamin, Roux, Simon, Sun, Christine L., Pratama, Akbar Adjie, Krupovic, Mart, Sullivan, Matthew B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10449787/
https://www.ncbi.nlm.nih.gov/pubmed/37620369
http://dx.doi.org/10.1038/s43705-023-00295-9
_version_ 1785095036467150848
author Vik, Dean
Bolduc, Benjamin
Roux, Simon
Sun, Christine L.
Pratama, Akbar Adjie
Krupovic, Mart
Sullivan, Matthew B.
author_facet Vik, Dean
Bolduc, Benjamin
Roux, Simon
Sun, Christine L.
Pratama, Akbar Adjie
Krupovic, Mart
Sullivan, Matthew B.
author_sort Vik, Dean
collection PubMed
description Our knowledge of viral sequence space has exploded with advancing sequencing technologies and large-scale sampling and analytical efforts. Though archaea are important and abundant prokaryotes in many systems, our knowledge of archaeal viruses outside of extreme environments is limited. This largely stems from the lack of a robust, high-throughput, and systematic way to distinguish between bacterial and archaeal viruses in datasets of curated viruses. Here we upgrade our prior text-based tool (MArVD) via training and testing a random forest machine learning algorithm against a newly curated dataset of archaeal viruses. After optimization, MArVD2 presented a significant improvement over its predecessor in terms of scalability, usability, and flexibility, and will allow user-defined custom training datasets as archaeal virus discovery progresses. Benchmarking showed that a model trained with viral sequences from the hypersaline, marine, and hot spring environments correctly classified 85% of the archaeal viruses with a false detection rate below 2% using a random forest prediction threshold of 80% in a separate benchmarking dataset from the same habitats.
format Online
Article
Text
id pubmed-10449787
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-104497872023-08-26 MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets Vik, Dean Bolduc, Benjamin Roux, Simon Sun, Christine L. Pratama, Akbar Adjie Krupovic, Mart Sullivan, Matthew B. ISME Commun Article Our knowledge of viral sequence space has exploded with advancing sequencing technologies and large-scale sampling and analytical efforts. Though archaea are important and abundant prokaryotes in many systems, our knowledge of archaeal viruses outside of extreme environments is limited. This largely stems from the lack of a robust, high-throughput, and systematic way to distinguish between bacterial and archaeal viruses in datasets of curated viruses. Here we upgrade our prior text-based tool (MArVD) via training and testing a random forest machine learning algorithm against a newly curated dataset of archaeal viruses. After optimization, MArVD2 presented a significant improvement over its predecessor in terms of scalability, usability, and flexibility, and will allow user-defined custom training datasets as archaeal virus discovery progresses. Benchmarking showed that a model trained with viral sequences from the hypersaline, marine, and hot spring environments correctly classified 85% of the archaeal viruses with a false detection rate below 2% using a random forest prediction threshold of 80% in a separate benchmarking dataset from the same habitats. Nature Publishing Group UK 2023-08-24 /pmc/articles/PMC10449787/ /pubmed/37620369 http://dx.doi.org/10.1038/s43705-023-00295-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Vik, Dean
Bolduc, Benjamin
Roux, Simon
Sun, Christine L.
Pratama, Akbar Adjie
Krupovic, Mart
Sullivan, Matthew B.
MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets
title MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets
title_full MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets
title_fullStr MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets
title_full_unstemmed MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets
title_short MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets
title_sort marvd2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10449787/
https://www.ncbi.nlm.nih.gov/pubmed/37620369
http://dx.doi.org/10.1038/s43705-023-00295-9
work_keys_str_mv AT vikdean marvd2amachinelearningenhancedtooltodiscriminatebetweenarchaealandbacterialvirusesinviraldatasets
AT bolducbenjamin marvd2amachinelearningenhancedtooltodiscriminatebetweenarchaealandbacterialvirusesinviraldatasets
AT rouxsimon marvd2amachinelearningenhancedtooltodiscriminatebetweenarchaealandbacterialvirusesinviraldatasets
AT sunchristinel marvd2amachinelearningenhancedtooltodiscriminatebetweenarchaealandbacterialvirusesinviraldatasets
AT pratamaakbaradjie marvd2amachinelearningenhancedtooltodiscriminatebetweenarchaealandbacterialvirusesinviraldatasets
AT krupovicmart marvd2amachinelearningenhancedtooltodiscriminatebetweenarchaealandbacterialvirusesinviraldatasets
AT sullivanmatthewb marvd2amachinelearningenhancedtooltodiscriminatebetweenarchaealandbacterialvirusesinviraldatasets