Cargando…

Predicting the capsid architecture of phages from metagenomic data

Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variabili...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Diana Y., Bartels, Caitlin, McNair, Katelyn, Edwards, Robert A., Swairjo, Manal A., Luque, Antoni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8814770/
https://www.ncbi.nlm.nih.gov/pubmed/35140890
http://dx.doi.org/10.1016/j.csbj.2021.12.032
_version_ 1784645135626141696
author Lee, Diana Y.
Bartels, Caitlin
McNair, Katelyn
Edwards, Robert A.
Swairjo, Manal A.
Luque, Antoni
author_facet Lee, Diana Y.
Bartels, Caitlin
McNair, Katelyn
Edwards, Robert A.
Swairjo, Manal A.
Luque, Antoni
author_sort Lee, Diana Y.
collection PubMed
description Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids’ diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic information with viral capsids in the environment. To address this problem, here, we introduce a computational approach to predict the capsid architecture (T-number) of tailed phages using the sequence of a single gene—the major capsid protein. This approach relies on an allometric model that relates the genome length and capsid architecture of tailed phages. This allometric model was applied to isolated phage genomes to generate a library that associated major capsid proteins and putative capsid architectures. This library was used to train machine learning methods, and the most computationally scalable model investigated (random forest) was applied to human gut metagenomes. Compared to isolated phages, the analysis of gut data reveals a large abundance of mid-sized (T = 7) capsids, as expected, followed by a relatively large frequency of jumbo-like tailed phage capsids (T ≥ 25) and small capsids (T = 4) that have been under-sampled. We discussed how to increase the method’s accuracy and how to extend the approach to other viruses. The computational pipeline introduced here opens the doors to monitor the ongoing evolution and selection of viral capsids across ecosystems.
format Online
Article
Text
id pubmed-8814770
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-88147702022-02-08 Predicting the capsid architecture of phages from metagenomic data Lee, Diana Y. Bartels, Caitlin McNair, Katelyn Edwards, Robert A. Swairjo, Manal A. Luque, Antoni Comput Struct Biotechnol J Research Article Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids’ diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic information with viral capsids in the environment. To address this problem, here, we introduce a computational approach to predict the capsid architecture (T-number) of tailed phages using the sequence of a single gene—the major capsid protein. This approach relies on an allometric model that relates the genome length and capsid architecture of tailed phages. This allometric model was applied to isolated phage genomes to generate a library that associated major capsid proteins and putative capsid architectures. This library was used to train machine learning methods, and the most computationally scalable model investigated (random forest) was applied to human gut metagenomes. Compared to isolated phages, the analysis of gut data reveals a large abundance of mid-sized (T = 7) capsids, as expected, followed by a relatively large frequency of jumbo-like tailed phage capsids (T ≥ 25) and small capsids (T = 4) that have been under-sampled. We discussed how to increase the method’s accuracy and how to extend the approach to other viruses. The computational pipeline introduced here opens the doors to monitor the ongoing evolution and selection of viral capsids across ecosystems. Research Network of Computational and Structural Biotechnology 2022-01-05 /pmc/articles/PMC8814770/ /pubmed/35140890 http://dx.doi.org/10.1016/j.csbj.2021.12.032 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Lee, Diana Y.
Bartels, Caitlin
McNair, Katelyn
Edwards, Robert A.
Swairjo, Manal A.
Luque, Antoni
Predicting the capsid architecture of phages from metagenomic data
title Predicting the capsid architecture of phages from metagenomic data
title_full Predicting the capsid architecture of phages from metagenomic data
title_fullStr Predicting the capsid architecture of phages from metagenomic data
title_full_unstemmed Predicting the capsid architecture of phages from metagenomic data
title_short Predicting the capsid architecture of phages from metagenomic data
title_sort predicting the capsid architecture of phages from metagenomic data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8814770/
https://www.ncbi.nlm.nih.gov/pubmed/35140890
http://dx.doi.org/10.1016/j.csbj.2021.12.032
work_keys_str_mv AT leedianay predictingthecapsidarchitectureofphagesfrommetagenomicdata
AT bartelscaitlin predictingthecapsidarchitectureofphagesfrommetagenomicdata
AT mcnairkatelyn predictingthecapsidarchitectureofphagesfrommetagenomicdata
AT edwardsroberta predictingthecapsidarchitectureofphagesfrommetagenomicdata
AT swairjomanala predictingthecapsidarchitectureofphagesfrommetagenomicdata
AT luqueantoni predictingthecapsidarchitectureofphagesfrommetagenomicdata