Cargando…

Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts

The genetic material of the three domains of life (Bacteria, Archaea, and Eukaryota) is always double-stranded DNA, and their GC content (molar content of guanine plus cytosine) varies between ≈ 13% and ≈ 75%. Nucleotide composition is the simplest way of characterizing genomes. Despite this simplic...

Descripción completa

Detalles Bibliográficos
Autores principales: Simón, Diego, Cristina, Juan, Musto, Héctor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8274242/
https://www.ncbi.nlm.nih.gov/pubmed/34262534
http://dx.doi.org/10.3389/fmicb.2021.646300
_version_ 1783721523775799296
author Simón, Diego
Cristina, Juan
Musto, Héctor
author_facet Simón, Diego
Cristina, Juan
Musto, Héctor
author_sort Simón, Diego
collection PubMed
description The genetic material of the three domains of life (Bacteria, Archaea, and Eukaryota) is always double-stranded DNA, and their GC content (molar content of guanine plus cytosine) varies between ≈ 13% and ≈ 75%. Nucleotide composition is the simplest way of characterizing genomes. Despite this simplicity, it has several implications. Indeed, it is the main factor that determines, among other features, dinucleotide frequencies, repeated short DNA sequences, and codon and amino acid usage. Which forces drive this strong variation is still a matter of controversy. For rather obvious reasons, most of the studies concerning this huge variation and its consequences, have been done in free-living organisms. However, no recent comprehensive study of all known viruses has been done (that is, concerning all available sequences). Viruses, by far the most abundant biological entities on Earth, are the causative agents of many diseases. An overview of these entities is important also because their genetic material is not always double-stranded DNA: indeed, certain viruses have as genetic material single-stranded DNA, double-stranded RNA, single-stranded RNA, and/or retro-transcribing. Therefore, one may wonder if what we have learned about the evolution of GC content and its implications in prokaryotes and eukaryotes also applies to viruses. In this contribution, we attempt to describe compositional properties of ∼ 10,000 viral species: base composition (globally and according to Baltimore classification), correlations among non-coding regions and the three codon positions, and the relationship of the nucleotide frequencies and codon usage of viruses with the same feature of their hosts. This allowed us to determine how the base composition of phages strongly correlate with the value of their respective hosts, while eukaryotic viruses do not (with fungi and protists as exceptions). Finally, we discuss some of these results concerning codon usage: reinforcing previous results, we found that phages and hosts exhibit moderate to high correlations, while for eukaryotes and their viruses the correlations are weak or do not exist.
format Online
Article
Text
id pubmed-8274242
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82742422021-07-13 Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts Simón, Diego Cristina, Juan Musto, Héctor Front Microbiol Microbiology The genetic material of the three domains of life (Bacteria, Archaea, and Eukaryota) is always double-stranded DNA, and their GC content (molar content of guanine plus cytosine) varies between ≈ 13% and ≈ 75%. Nucleotide composition is the simplest way of characterizing genomes. Despite this simplicity, it has several implications. Indeed, it is the main factor that determines, among other features, dinucleotide frequencies, repeated short DNA sequences, and codon and amino acid usage. Which forces drive this strong variation is still a matter of controversy. For rather obvious reasons, most of the studies concerning this huge variation and its consequences, have been done in free-living organisms. However, no recent comprehensive study of all known viruses has been done (that is, concerning all available sequences). Viruses, by far the most abundant biological entities on Earth, are the causative agents of many diseases. An overview of these entities is important also because their genetic material is not always double-stranded DNA: indeed, certain viruses have as genetic material single-stranded DNA, double-stranded RNA, single-stranded RNA, and/or retro-transcribing. Therefore, one may wonder if what we have learned about the evolution of GC content and its implications in prokaryotes and eukaryotes also applies to viruses. In this contribution, we attempt to describe compositional properties of ∼ 10,000 viral species: base composition (globally and according to Baltimore classification), correlations among non-coding regions and the three codon positions, and the relationship of the nucleotide frequencies and codon usage of viruses with the same feature of their hosts. This allowed us to determine how the base composition of phages strongly correlate with the value of their respective hosts, while eukaryotic viruses do not (with fungi and protists as exceptions). Finally, we discuss some of these results concerning codon usage: reinforcing previous results, we found that phages and hosts exhibit moderate to high correlations, while for eukaryotes and their viruses the correlations are weak or do not exist. Frontiers Media S.A. 2021-06-28 /pmc/articles/PMC8274242/ /pubmed/34262534 http://dx.doi.org/10.3389/fmicb.2021.646300 Text en Copyright © 2021 Simón, Cristina and Musto. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Simón, Diego
Cristina, Juan
Musto, Héctor
Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts
title Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts
title_full Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts
title_fullStr Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts
title_full_unstemmed Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts
title_short Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts
title_sort nucleotide composition and codon usage across viruses and their respective hosts
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8274242/
https://www.ncbi.nlm.nih.gov/pubmed/34262534
http://dx.doi.org/10.3389/fmicb.2021.646300
work_keys_str_mv AT simondiego nucleotidecompositionandcodonusageacrossvirusesandtheirrespectivehosts
AT cristinajuan nucleotidecompositionandcodonusageacrossvirusesandtheirrespectivehosts
AT mustohector nucleotidecompositionandcodonusageacrossvirusesandtheirrespectivehosts