Cargando…

The Compressed Vocabulary of Microbial Life

Communication is an undisputed central activity of life that requires an evolving molecular language. It conveys meaning through messages and vocabularies. Here, I explore the existence of a growing vocabulary in the molecules and molecular functions of the microbial world. There are clear correspon...

Descripción completa

Detalles Bibliográficos
Autor principal: Caetano-Anollés, Gustavo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8292947/
https://www.ncbi.nlm.nih.gov/pubmed/34305827
http://dx.doi.org/10.3389/fmicb.2021.655990
_version_ 1783724926249730048
author Caetano-Anollés, Gustavo
author_facet Caetano-Anollés, Gustavo
author_sort Caetano-Anollés, Gustavo
collection PubMed
description Communication is an undisputed central activity of life that requires an evolving molecular language. It conveys meaning through messages and vocabularies. Here, I explore the existence of a growing vocabulary in the molecules and molecular functions of the microbial world. There are clear correspondences between the lexicon, syntax, semantics, and pragmatics of language organization and the module, structure, function, and fitness paradigms of molecular biology. These correspondences are constrained by universal laws and engineering principles. Macromolecular structure, for example, follows quantitative linguistic patterns arising from statistical laws that are likely universal, including the Zipf’s law, a special case of the scale-free distribution, the Heaps’ law describing sublinear growth typical of economies of scales, and the Menzerath–Altmann’s law, which imposes size-dependent patterns of decreasing returns. Trade-off solutions between principles of economy, flexibility, and robustness define a “triangle of persistence” describing the impact of the environment on a biological system. The pragmatic landscape of the triangle interfaces with the syntax and semantics of molecular languages, which together with comparative and evolutionary genomic data can explain global patterns of diversification of cellular life. The vocabularies of proteins (proteomes) and functions (functionomes) revealed a significant universal lexical core supporting a universal common ancestor, an ancestral evolutionary link between Bacteria and Eukarya, and distinct reductive evolutionary strategies of language compression in Archaea and Bacteria. A “causal” word cloud strategy inspired by the dependency grammar paradigm used in catenae unfolded the evolution of lexical units associated with Gene Ontology terms at different levels of ontological abstraction. While Archaea holds the smallest, oldest, and most homogeneous vocabulary of all superkingdoms, Bacteria heterogeneously apportions a more complex vocabulary, and Eukarya pushes functional innovation through mechanisms of flexibility and robustness.
format Online
Article
Text
id pubmed-8292947
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82929472021-07-22 The Compressed Vocabulary of Microbial Life Caetano-Anollés, Gustavo Front Microbiol Microbiology Communication is an undisputed central activity of life that requires an evolving molecular language. It conveys meaning through messages and vocabularies. Here, I explore the existence of a growing vocabulary in the molecules and molecular functions of the microbial world. There are clear correspondences between the lexicon, syntax, semantics, and pragmatics of language organization and the module, structure, function, and fitness paradigms of molecular biology. These correspondences are constrained by universal laws and engineering principles. Macromolecular structure, for example, follows quantitative linguistic patterns arising from statistical laws that are likely universal, including the Zipf’s law, a special case of the scale-free distribution, the Heaps’ law describing sublinear growth typical of economies of scales, and the Menzerath–Altmann’s law, which imposes size-dependent patterns of decreasing returns. Trade-off solutions between principles of economy, flexibility, and robustness define a “triangle of persistence” describing the impact of the environment on a biological system. The pragmatic landscape of the triangle interfaces with the syntax and semantics of molecular languages, which together with comparative and evolutionary genomic data can explain global patterns of diversification of cellular life. The vocabularies of proteins (proteomes) and functions (functionomes) revealed a significant universal lexical core supporting a universal common ancestor, an ancestral evolutionary link between Bacteria and Eukarya, and distinct reductive evolutionary strategies of language compression in Archaea and Bacteria. A “causal” word cloud strategy inspired by the dependency grammar paradigm used in catenae unfolded the evolution of lexical units associated with Gene Ontology terms at different levels of ontological abstraction. While Archaea holds the smallest, oldest, and most homogeneous vocabulary of all superkingdoms, Bacteria heterogeneously apportions a more complex vocabulary, and Eukarya pushes functional innovation through mechanisms of flexibility and robustness. Frontiers Media S.A. 2021-07-07 /pmc/articles/PMC8292947/ /pubmed/34305827 http://dx.doi.org/10.3389/fmicb.2021.655990 Text en Copyright © 2021 Caetano-Anollés. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Caetano-Anollés, Gustavo
The Compressed Vocabulary of Microbial Life
title The Compressed Vocabulary of Microbial Life
title_full The Compressed Vocabulary of Microbial Life
title_fullStr The Compressed Vocabulary of Microbial Life
title_full_unstemmed The Compressed Vocabulary of Microbial Life
title_short The Compressed Vocabulary of Microbial Life
title_sort compressed vocabulary of microbial life
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8292947/
https://www.ncbi.nlm.nih.gov/pubmed/34305827
http://dx.doi.org/10.3389/fmicb.2021.655990
work_keys_str_mv AT caetanoanollesgustavo thecompressedvocabularyofmicrobiallife
AT caetanoanollesgustavo compressedvocabularyofmicrobiallife