Cargando…

Rapid discovery of novel prophages using biological feature engineering and machine learning

Prophages are phages that are integrated into bacterial genomes and which are key to understanding many aspects of bacterial biology. Their extreme diversity means they are challenging to detect using sequence similarity, yet this remains the paradigm and thus many phages remain unidentified. We pre...

Descripción completa

Detalles Bibliográficos
Autores principales: Sirén, Kimmo, Millard, Andrew, Petersen, Bent, Gilbert, M Thomas P, Clokie, Martha R J, Sicheritz-Pontén, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7787355/
https://www.ncbi.nlm.nih.gov/pubmed/33575651
http://dx.doi.org/10.1093/nargab/lqaa109
_version_ 1783632807536361472
author Sirén, Kimmo
Millard, Andrew
Petersen, Bent
Gilbert, M Thomas P
Clokie, Martha R J
Sicheritz-Pontén, Thomas
author_facet Sirén, Kimmo
Millard, Andrew
Petersen, Bent
Gilbert, M Thomas P
Clokie, Martha R J
Sicheritz-Pontén, Thomas
author_sort Sirén, Kimmo
collection PubMed
description Prophages are phages that are integrated into bacterial genomes and which are key to understanding many aspects of bacterial biology. Their extreme diversity means they are challenging to detect using sequence similarity, yet this remains the paradigm and thus many phages remain unidentified. We present a novel, fast and generalizing machine learning method based on feature space to facilitate novel prophage discovery. To validate the approach, we reanalyzed publicly available marine viromes and single-cell genomes using our feature-based approaches and found consistently more phages than were detected using current state-of-the-art tools while being notably faster. This demonstrates that our approach significantly enhances bacteriophage discovery and thus provides a new starting point for exploring new biologies.
format Online
Article
Text
id pubmed-7787355
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77873552021-02-10 Rapid discovery of novel prophages using biological feature engineering and machine learning Sirén, Kimmo Millard, Andrew Petersen, Bent Gilbert, M Thomas P Clokie, Martha R J Sicheritz-Pontén, Thomas NAR Genom Bioinform Methods Article Prophages are phages that are integrated into bacterial genomes and which are key to understanding many aspects of bacterial biology. Their extreme diversity means they are challenging to detect using sequence similarity, yet this remains the paradigm and thus many phages remain unidentified. We present a novel, fast and generalizing machine learning method based on feature space to facilitate novel prophage discovery. To validate the approach, we reanalyzed publicly available marine viromes and single-cell genomes using our feature-based approaches and found consistently more phages than were detected using current state-of-the-art tools while being notably faster. This demonstrates that our approach significantly enhances bacteriophage discovery and thus provides a new starting point for exploring new biologies. Oxford University Press 2021-01-06 /pmc/articles/PMC7787355/ /pubmed/33575651 http://dx.doi.org/10.1093/nargab/lqaa109 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Article
Sirén, Kimmo
Millard, Andrew
Petersen, Bent
Gilbert, M Thomas P
Clokie, Martha R J
Sicheritz-Pontén, Thomas
Rapid discovery of novel prophages using biological feature engineering and machine learning
title Rapid discovery of novel prophages using biological feature engineering and machine learning
title_full Rapid discovery of novel prophages using biological feature engineering and machine learning
title_fullStr Rapid discovery of novel prophages using biological feature engineering and machine learning
title_full_unstemmed Rapid discovery of novel prophages using biological feature engineering and machine learning
title_short Rapid discovery of novel prophages using biological feature engineering and machine learning
title_sort rapid discovery of novel prophages using biological feature engineering and machine learning
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7787355/
https://www.ncbi.nlm.nih.gov/pubmed/33575651
http://dx.doi.org/10.1093/nargab/lqaa109
work_keys_str_mv AT sirenkimmo rapiddiscoveryofnovelprophagesusingbiologicalfeatureengineeringandmachinelearning
AT millardandrew rapiddiscoveryofnovelprophagesusingbiologicalfeatureengineeringandmachinelearning
AT petersenbent rapiddiscoveryofnovelprophagesusingbiologicalfeatureengineeringandmachinelearning
AT gilbertmthomasp rapiddiscoveryofnovelprophagesusingbiologicalfeatureengineeringandmachinelearning
AT clokiemartharj rapiddiscoveryofnovelprophagesusingbiologicalfeatureengineeringandmachinelearning
AT sicheritzpontenthomas rapiddiscoveryofnovelprophagesusingbiologicalfeatureengineeringandmachinelearning