Cargando…

Compositional Structure of the Genome: A Review

SIMPLE SUMMARY: DNA structural biology deals with the understanding of DNA and three-dimensional chromatin structure, which can determine its function in the cell. The key structural properties of the DNA fiber, such as stability, flexibility, and susceptibility to damage, largely rely on the compos...

Descripción completa

Detalles Bibliográficos
Autores principales: Bernaola-Galván, Pedro, Carpena, Pedro, Gómez-Martín, Cristina, Oliver, Jose L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10295253/
https://www.ncbi.nlm.nih.gov/pubmed/37372134
http://dx.doi.org/10.3390/biology12060849
_version_ 1785063376439738368
author Bernaola-Galván, Pedro
Carpena, Pedro
Gómez-Martín, Cristina
Oliver, Jose L.
author_facet Bernaola-Galván, Pedro
Carpena, Pedro
Gómez-Martín, Cristina
Oliver, Jose L.
author_sort Bernaola-Galván, Pedro
collection PubMed
description SIMPLE SUMMARY: DNA structural biology deals with the understanding of DNA and three-dimensional chromatin structure, which can determine its function in the cell. The key structural properties of the DNA fiber, such as stability, flexibility, and susceptibility to damage, largely rely on the composition of the DNA sequence. Variations in the nucleotide sequence result in a patchy chromosome structure, which is formed due to the differential GC content of exons, introns, regulatory elements, repeats, etc. The compositional structure of a genome at different length scales may be revealed via the use of entropic segmentation algorithms or fluctuation analysis of DNA walks. The former algorithms divide the four-symbol nucleotide sequence, or its two-symbol variants, into an array of compositionally homogeneous, non-overlapping domains, isochores, and compositional superstructures, all of which are hierarchically organized in the chromosome. Once the compositional structure of a genome is known, the compositional genome signature or sequence compositional complexity (SCC) can be computed, enabling the comparison of genome structures. ABSTRACT: As the genome carries the historical information of a species’ biotic and environmental interactions, analyzing changes in genome structure over time by using powerful statistical physics methods (such as entropic segmentation algorithms, fluctuation analysis in DNA walks, or measures of compositional complexity) provides valuable insights into genome evolution. Nucleotide frequencies tend to vary along the DNA chain, resulting in a hierarchically patchy chromosome structure with heterogeneities at different length scales that range from a few nucleotides to tens of millions of them. Fluctuation analysis reveals that these compositional structures can be classified into three main categories: (1) short-range heterogeneities (below a few kilobase pairs (Kbp)) primarily attributed to the alternation of coding and noncoding regions, interspersed or tandem repeats densities, etc.; (2) isochores, spanning tens to hundreds of tens of Kbp; and (3) superstructures, reaching sizes of tens of megabase pairs (Mbp) or even larger. The obtained isochore and superstructure coordinates in the first complete T2T human sequence are now shared in a public database. In this way, interested researchers can use T2T isochore data, as well as the annotations for different genome elements, to check a specific hypothesis about genome structure. Similarly to other levels of biological organization, a hierarchical compositional structure is prevalent in the genome. Once the compositional structure of a genome is identified, various measures can be derived to quantify the heterogeneity of such structure. The distribution of segment G+C content has recently been proposed as a new genome signature that proves to be useful for comparing complete genomes. Another meaningful measure is the sequence compositional complexity (SCC), which has been used for genome structure comparisons. Lastly, we review the recent genome comparisons in species of the ancient phylum Cyanobacteria, conducted by phylogenetic regression of SCC against time, which have revealed positive trends towards higher genome complexity. These findings provide the first evidence for a driven progressive evolution of genome compositional structure.
format Online
Article
Text
id pubmed-10295253
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-102952532023-06-28 Compositional Structure of the Genome: A Review Bernaola-Galván, Pedro Carpena, Pedro Gómez-Martín, Cristina Oliver, Jose L. Biology (Basel) Review SIMPLE SUMMARY: DNA structural biology deals with the understanding of DNA and three-dimensional chromatin structure, which can determine its function in the cell. The key structural properties of the DNA fiber, such as stability, flexibility, and susceptibility to damage, largely rely on the composition of the DNA sequence. Variations in the nucleotide sequence result in a patchy chromosome structure, which is formed due to the differential GC content of exons, introns, regulatory elements, repeats, etc. The compositional structure of a genome at different length scales may be revealed via the use of entropic segmentation algorithms or fluctuation analysis of DNA walks. The former algorithms divide the four-symbol nucleotide sequence, or its two-symbol variants, into an array of compositionally homogeneous, non-overlapping domains, isochores, and compositional superstructures, all of which are hierarchically organized in the chromosome. Once the compositional structure of a genome is known, the compositional genome signature or sequence compositional complexity (SCC) can be computed, enabling the comparison of genome structures. ABSTRACT: As the genome carries the historical information of a species’ biotic and environmental interactions, analyzing changes in genome structure over time by using powerful statistical physics methods (such as entropic segmentation algorithms, fluctuation analysis in DNA walks, or measures of compositional complexity) provides valuable insights into genome evolution. Nucleotide frequencies tend to vary along the DNA chain, resulting in a hierarchically patchy chromosome structure with heterogeneities at different length scales that range from a few nucleotides to tens of millions of them. Fluctuation analysis reveals that these compositional structures can be classified into three main categories: (1) short-range heterogeneities (below a few kilobase pairs (Kbp)) primarily attributed to the alternation of coding and noncoding regions, interspersed or tandem repeats densities, etc.; (2) isochores, spanning tens to hundreds of tens of Kbp; and (3) superstructures, reaching sizes of tens of megabase pairs (Mbp) or even larger. The obtained isochore and superstructure coordinates in the first complete T2T human sequence are now shared in a public database. In this way, interested researchers can use T2T isochore data, as well as the annotations for different genome elements, to check a specific hypothesis about genome structure. Similarly to other levels of biological organization, a hierarchical compositional structure is prevalent in the genome. Once the compositional structure of a genome is identified, various measures can be derived to quantify the heterogeneity of such structure. The distribution of segment G+C content has recently been proposed as a new genome signature that proves to be useful for comparing complete genomes. Another meaningful measure is the sequence compositional complexity (SCC), which has been used for genome structure comparisons. Lastly, we review the recent genome comparisons in species of the ancient phylum Cyanobacteria, conducted by phylogenetic regression of SCC against time, which have revealed positive trends towards higher genome complexity. These findings provide the first evidence for a driven progressive evolution of genome compositional structure. MDPI 2023-06-13 /pmc/articles/PMC10295253/ /pubmed/37372134 http://dx.doi.org/10.3390/biology12060849 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Review
Bernaola-Galván, Pedro
Carpena, Pedro
Gómez-Martín, Cristina
Oliver, Jose L.
Compositional Structure of the Genome: A Review
title Compositional Structure of the Genome: A Review
title_full Compositional Structure of the Genome: A Review
title_fullStr Compositional Structure of the Genome: A Review
title_full_unstemmed Compositional Structure of the Genome: A Review
title_short Compositional Structure of the Genome: A Review
title_sort compositional structure of the genome: a review
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10295253/
https://www.ncbi.nlm.nih.gov/pubmed/37372134
http://dx.doi.org/10.3390/biology12060849
work_keys_str_mv AT bernaolagalvanpedro compositionalstructureofthegenomeareview
AT carpenapedro compositionalstructureofthegenomeareview
AT gomezmartincristina compositionalstructureofthegenomeareview
AT oliverjosel compositionalstructureofthegenomeareview