Cargando…

Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents

Many of the sequenced bacterial and archaeal genomes encode regions of viral provenance. Yet, not all of these regions encode bona fide viruses. Gene transfer agents (GTAs) are thought to be former viruses that are now maintained in genomes of some bacteria and archaea and are hypothesized to enable...

Descripción completa

Detalles Bibliográficos
Autores principales: Kogay, Roman, Neely, Taylor B, Birnbaum, Daniel P, Hankel, Camille R, Shakya, Migun, Zhaxybayeva, Olga
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821227/
https://www.ncbi.nlm.nih.gov/pubmed/31560374
http://dx.doi.org/10.1093/gbe/evz206
_version_ 1783464106880139264
author Kogay, Roman
Neely, Taylor B
Birnbaum, Daniel P
Hankel, Camille R
Shakya, Migun
Zhaxybayeva, Olga
author_facet Kogay, Roman
Neely, Taylor B
Birnbaum, Daniel P
Hankel, Camille R
Shakya, Migun
Zhaxybayeva, Olga
author_sort Kogay, Roman
collection PubMed
description Many of the sequenced bacterial and archaeal genomes encode regions of viral provenance. Yet, not all of these regions encode bona fide viruses. Gene transfer agents (GTAs) are thought to be former viruses that are now maintained in genomes of some bacteria and archaea and are hypothesized to enable exchange of DNA within bacterial populations. In Alphaproteobacteria, genes homologous to the “head–tail” gene cluster that encodes structural components of the Rhodobacter capsulatus GTA (RcGTA) are found in many taxa, even if they are only distantly related to Rhodobacter capsulatus. Yet, in most genomes available in GenBank RcGTA-like genes have annotations of typical viral proteins, and therefore are not easily distinguished from their viral homologs without additional analyses. Here, we report a “support vector machine” classifier that quickly and accurately distinguishes RcGTA-like genes from their viral homologs by capturing the differences in the amino acid composition of the encoded proteins. Our open-source classifier is implemented in Python and can be used to scan homologs of the RcGTA genes in newly sequenced genomes. The classifier can also be trained to identify other types of GTAs, or even to detect other elements of viral ancestry. Using the classifier trained on a manually curated set of homologous viruses and GTAs, we detected RcGTA-like “head–tail” gene clusters in 57.5% of the 1,423 examined alphaproteobacterial genomes. We also demonstrated that more than half of the in silico prophage predictions are instead likely to be GTAs, suggesting that in many alphaproteobacterial genomes the RcGTA-like elements remain unrecognized.
format Online
Article
Text
id pubmed-6821227
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68212272019-11-04 Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents Kogay, Roman Neely, Taylor B Birnbaum, Daniel P Hankel, Camille R Shakya, Migun Zhaxybayeva, Olga Genome Biol Evol Research Article Many of the sequenced bacterial and archaeal genomes encode regions of viral provenance. Yet, not all of these regions encode bona fide viruses. Gene transfer agents (GTAs) are thought to be former viruses that are now maintained in genomes of some bacteria and archaea and are hypothesized to enable exchange of DNA within bacterial populations. In Alphaproteobacteria, genes homologous to the “head–tail” gene cluster that encodes structural components of the Rhodobacter capsulatus GTA (RcGTA) are found in many taxa, even if they are only distantly related to Rhodobacter capsulatus. Yet, in most genomes available in GenBank RcGTA-like genes have annotations of typical viral proteins, and therefore are not easily distinguished from their viral homologs without additional analyses. Here, we report a “support vector machine” classifier that quickly and accurately distinguishes RcGTA-like genes from their viral homologs by capturing the differences in the amino acid composition of the encoded proteins. Our open-source classifier is implemented in Python and can be used to scan homologs of the RcGTA genes in newly sequenced genomes. The classifier can also be trained to identify other types of GTAs, or even to detect other elements of viral ancestry. Using the classifier trained on a manually curated set of homologous viruses and GTAs, we detected RcGTA-like “head–tail” gene clusters in 57.5% of the 1,423 examined alphaproteobacterial genomes. We also demonstrated that more than half of the in silico prophage predictions are instead likely to be GTAs, suggesting that in many alphaproteobacterial genomes the RcGTA-like elements remain unrecognized. Oxford University Press 2019-09-27 /pmc/articles/PMC6821227/ /pubmed/31560374 http://dx.doi.org/10.1093/gbe/evz206 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kogay, Roman
Neely, Taylor B
Birnbaum, Daniel P
Hankel, Camille R
Shakya, Migun
Zhaxybayeva, Olga
Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents
title Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents
title_full Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents
title_fullStr Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents
title_full_unstemmed Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents
title_short Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents
title_sort machine-learning classification suggests that many alphaproteobacterial prophages may instead be gene transfer agents
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821227/
https://www.ncbi.nlm.nih.gov/pubmed/31560374
http://dx.doi.org/10.1093/gbe/evz206
work_keys_str_mv AT kogayroman machinelearningclassificationsuggeststhatmanyalphaproteobacterialprophagesmayinsteadbegenetransferagents
AT neelytaylorb machinelearningclassificationsuggeststhatmanyalphaproteobacterialprophagesmayinsteadbegenetransferagents
AT birnbaumdanielp machinelearningclassificationsuggeststhatmanyalphaproteobacterialprophagesmayinsteadbegenetransferagents
AT hankelcamiller machinelearningclassificationsuggeststhatmanyalphaproteobacterialprophagesmayinsteadbegenetransferagents
AT shakyamigun machinelearningclassificationsuggeststhatmanyalphaproteobacterialprophagesmayinsteadbegenetransferagents
AT zhaxybayevaolga machinelearningclassificationsuggeststhatmanyalphaproteobacterialprophagesmayinsteadbegenetransferagents