Cargando…
Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts
The genetic material of the three domains of life (Bacteria, Archaea, and Eukaryota) is always double-stranded DNA, and their GC content (molar content of guanine plus cytosine) varies between ≈ 13% and ≈ 75%. Nucleotide composition is the simplest way of characterizing genomes. Despite this simplic...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8274242/ https://www.ncbi.nlm.nih.gov/pubmed/34262534 http://dx.doi.org/10.3389/fmicb.2021.646300 |
_version_ | 1783721523775799296 |
---|---|
author | Simón, Diego Cristina, Juan Musto, Héctor |
author_facet | Simón, Diego Cristina, Juan Musto, Héctor |
author_sort | Simón, Diego |
collection | PubMed |
description | The genetic material of the three domains of life (Bacteria, Archaea, and Eukaryota) is always double-stranded DNA, and their GC content (molar content of guanine plus cytosine) varies between ≈ 13% and ≈ 75%. Nucleotide composition is the simplest way of characterizing genomes. Despite this simplicity, it has several implications. Indeed, it is the main factor that determines, among other features, dinucleotide frequencies, repeated short DNA sequences, and codon and amino acid usage. Which forces drive this strong variation is still a matter of controversy. For rather obvious reasons, most of the studies concerning this huge variation and its consequences, have been done in free-living organisms. However, no recent comprehensive study of all known viruses has been done (that is, concerning all available sequences). Viruses, by far the most abundant biological entities on Earth, are the causative agents of many diseases. An overview of these entities is important also because their genetic material is not always double-stranded DNA: indeed, certain viruses have as genetic material single-stranded DNA, double-stranded RNA, single-stranded RNA, and/or retro-transcribing. Therefore, one may wonder if what we have learned about the evolution of GC content and its implications in prokaryotes and eukaryotes also applies to viruses. In this contribution, we attempt to describe compositional properties of ∼ 10,000 viral species: base composition (globally and according to Baltimore classification), correlations among non-coding regions and the three codon positions, and the relationship of the nucleotide frequencies and codon usage of viruses with the same feature of their hosts. This allowed us to determine how the base composition of phages strongly correlate with the value of their respective hosts, while eukaryotic viruses do not (with fungi and protists as exceptions). Finally, we discuss some of these results concerning codon usage: reinforcing previous results, we found that phages and hosts exhibit moderate to high correlations, while for eukaryotes and their viruses the correlations are weak or do not exist. |
format | Online Article Text |
id | pubmed-8274242 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-82742422021-07-13 Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts Simón, Diego Cristina, Juan Musto, Héctor Front Microbiol Microbiology The genetic material of the three domains of life (Bacteria, Archaea, and Eukaryota) is always double-stranded DNA, and their GC content (molar content of guanine plus cytosine) varies between ≈ 13% and ≈ 75%. Nucleotide composition is the simplest way of characterizing genomes. Despite this simplicity, it has several implications. Indeed, it is the main factor that determines, among other features, dinucleotide frequencies, repeated short DNA sequences, and codon and amino acid usage. Which forces drive this strong variation is still a matter of controversy. For rather obvious reasons, most of the studies concerning this huge variation and its consequences, have been done in free-living organisms. However, no recent comprehensive study of all known viruses has been done (that is, concerning all available sequences). Viruses, by far the most abundant biological entities on Earth, are the causative agents of many diseases. An overview of these entities is important also because their genetic material is not always double-stranded DNA: indeed, certain viruses have as genetic material single-stranded DNA, double-stranded RNA, single-stranded RNA, and/or retro-transcribing. Therefore, one may wonder if what we have learned about the evolution of GC content and its implications in prokaryotes and eukaryotes also applies to viruses. In this contribution, we attempt to describe compositional properties of ∼ 10,000 viral species: base composition (globally and according to Baltimore classification), correlations among non-coding regions and the three codon positions, and the relationship of the nucleotide frequencies and codon usage of viruses with the same feature of their hosts. This allowed us to determine how the base composition of phages strongly correlate with the value of their respective hosts, while eukaryotic viruses do not (with fungi and protists as exceptions). Finally, we discuss some of these results concerning codon usage: reinforcing previous results, we found that phages and hosts exhibit moderate to high correlations, while for eukaryotes and their viruses the correlations are weak or do not exist. Frontiers Media S.A. 2021-06-28 /pmc/articles/PMC8274242/ /pubmed/34262534 http://dx.doi.org/10.3389/fmicb.2021.646300 Text en Copyright © 2021 Simón, Cristina and Musto. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Simón, Diego Cristina, Juan Musto, Héctor Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts |
title | Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts |
title_full | Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts |
title_fullStr | Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts |
title_full_unstemmed | Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts |
title_short | Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts |
title_sort | nucleotide composition and codon usage across viruses and their respective hosts |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8274242/ https://www.ncbi.nlm.nih.gov/pubmed/34262534 http://dx.doi.org/10.3389/fmicb.2021.646300 |
work_keys_str_mv | AT simondiego nucleotidecompositionandcodonusageacrossvirusesandtheirrespectivehosts AT cristinajuan nucleotidecompositionandcodonusageacrossvirusesandtheirrespectivehosts AT mustohector nucleotidecompositionandcodonusageacrossvirusesandtheirrespectivehosts |