Cargando…
Predicting the capsid architecture of phages from metagenomic data
Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variabili...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8814770/ https://www.ncbi.nlm.nih.gov/pubmed/35140890 http://dx.doi.org/10.1016/j.csbj.2021.12.032 |
_version_ | 1784645135626141696 |
---|---|
author | Lee, Diana Y. Bartels, Caitlin McNair, Katelyn Edwards, Robert A. Swairjo, Manal A. Luque, Antoni |
author_facet | Lee, Diana Y. Bartels, Caitlin McNair, Katelyn Edwards, Robert A. Swairjo, Manal A. Luque, Antoni |
author_sort | Lee, Diana Y. |
collection | PubMed |
description | Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids’ diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic information with viral capsids in the environment. To address this problem, here, we introduce a computational approach to predict the capsid architecture (T-number) of tailed phages using the sequence of a single gene—the major capsid protein. This approach relies on an allometric model that relates the genome length and capsid architecture of tailed phages. This allometric model was applied to isolated phage genomes to generate a library that associated major capsid proteins and putative capsid architectures. This library was used to train machine learning methods, and the most computationally scalable model investigated (random forest) was applied to human gut metagenomes. Compared to isolated phages, the analysis of gut data reveals a large abundance of mid-sized (T = 7) capsids, as expected, followed by a relatively large frequency of jumbo-like tailed phage capsids (T ≥ 25) and small capsids (T = 4) that have been under-sampled. We discussed how to increase the method’s accuracy and how to extend the approach to other viruses. The computational pipeline introduced here opens the doors to monitor the ongoing evolution and selection of viral capsids across ecosystems. |
format | Online Article Text |
id | pubmed-8814770 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-88147702022-02-08 Predicting the capsid architecture of phages from metagenomic data Lee, Diana Y. Bartels, Caitlin McNair, Katelyn Edwards, Robert A. Swairjo, Manal A. Luque, Antoni Comput Struct Biotechnol J Research Article Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids’ diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic information with viral capsids in the environment. To address this problem, here, we introduce a computational approach to predict the capsid architecture (T-number) of tailed phages using the sequence of a single gene—the major capsid protein. This approach relies on an allometric model that relates the genome length and capsid architecture of tailed phages. This allometric model was applied to isolated phage genomes to generate a library that associated major capsid proteins and putative capsid architectures. This library was used to train machine learning methods, and the most computationally scalable model investigated (random forest) was applied to human gut metagenomes. Compared to isolated phages, the analysis of gut data reveals a large abundance of mid-sized (T = 7) capsids, as expected, followed by a relatively large frequency of jumbo-like tailed phage capsids (T ≥ 25) and small capsids (T = 4) that have been under-sampled. We discussed how to increase the method’s accuracy and how to extend the approach to other viruses. The computational pipeline introduced here opens the doors to monitor the ongoing evolution and selection of viral capsids across ecosystems. Research Network of Computational and Structural Biotechnology 2022-01-05 /pmc/articles/PMC8814770/ /pubmed/35140890 http://dx.doi.org/10.1016/j.csbj.2021.12.032 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Research Article Lee, Diana Y. Bartels, Caitlin McNair, Katelyn Edwards, Robert A. Swairjo, Manal A. Luque, Antoni Predicting the capsid architecture of phages from metagenomic data |
title | Predicting the capsid architecture of phages from metagenomic data |
title_full | Predicting the capsid architecture of phages from metagenomic data |
title_fullStr | Predicting the capsid architecture of phages from metagenomic data |
title_full_unstemmed | Predicting the capsid architecture of phages from metagenomic data |
title_short | Predicting the capsid architecture of phages from metagenomic data |
title_sort | predicting the capsid architecture of phages from metagenomic data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8814770/ https://www.ncbi.nlm.nih.gov/pubmed/35140890 http://dx.doi.org/10.1016/j.csbj.2021.12.032 |
work_keys_str_mv | AT leedianay predictingthecapsidarchitectureofphagesfrommetagenomicdata AT bartelscaitlin predictingthecapsidarchitectureofphagesfrommetagenomicdata AT mcnairkatelyn predictingthecapsidarchitectureofphagesfrommetagenomicdata AT edwardsroberta predictingthecapsidarchitectureofphagesfrommetagenomicdata AT swairjomanala predictingthecapsidarchitectureofphagesfrommetagenomicdata AT luqueantoni predictingthecapsidarchitectureofphagesfrommetagenomicdata |