Cargando…

Inter-chromosomal k-mer distances

BACKGROUND: Inversion Symmetry is a generalization of the second Chargaff rule, stating that the count of a string of k nucleotides on a single chromosomal strand equals the count of its inverse (reverse-complement) k-mer. It holds for many species, both eukaryotes and prokaryotes, for ranges of k w...

Descripción completa

Detalles Bibliográficos
Autores principales: Kafri, Alon, Chor, Benny, Horn, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8422766/
https://www.ncbi.nlm.nih.gov/pubmed/34488632
http://dx.doi.org/10.1186/s12864-021-07952-0
_version_ 1783749342540070912
author Kafri, Alon
Chor, Benny
Horn, David
author_facet Kafri, Alon
Chor, Benny
Horn, David
author_sort Kafri, Alon
collection PubMed
description BACKGROUND: Inversion Symmetry is a generalization of the second Chargaff rule, stating that the count of a string of k nucleotides on a single chromosomal strand equals the count of its inverse (reverse-complement) k-mer. It holds for many species, both eukaryotes and prokaryotes, for ranges of k which may vary from 7 to 10 as chromosomal lengths vary from 2Mbp to 200 Mbp. Building on this formalism we introduce the concept of k-mer distances between chromosomes. We formulate two k-mer distance measures, D(1) and D(2), which depend on k. D(1) takes into account all k-mers (for a single k) appearing on single strands of the two compared chromosomes, whereas D(2) takes into account both strands of each chromosome. Both measures reflect dissimilarities in global chromosomal structures. RESULTS: After defining the various distance measures and summarizing their properties, we also define proximities that rely on the existence of synteny blocks between chromosomes of different bacterial strains. Comparing pairs of strains of bacteria, we find negative correlations between synteny proximities and k-mer distances, thus establishing the meaning of the latter as measures of evolutionary distances among bacterial strains. The synteny measures we use are appropriate for closely related bacterial strains, where considerable sections of chromosomes demonstrate high direct or reversed equality. These measures are not appropriate for comparing different bacteria or eukaryotes. K-mer structural distances can be defined for all species. Because of the arbitrariness of strand choices, we employ only the D(2) measure when comparing chromosomes of different species. The results for comparisons of various eukaryotes display interesting behavior which is partially consistent with conventional understanding of evolutionary genomics. In particular, we define ratios of minimal k-mer distances (KDR) between unmasked and masked chromosomes of two species, which correlate with both short and long evolutionary scales. CONCLUSIONS: k-mer distances reflect dissimilarities among global chromosomal structures. They carry information which aggregates all mutations. As such they can complement traditional evolution studies , which mainly concentrate on coding regions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07952-0.
format Online
Article
Text
id pubmed-8422766
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84227662021-09-09 Inter-chromosomal k-mer distances Kafri, Alon Chor, Benny Horn, David BMC Genomics Research BACKGROUND: Inversion Symmetry is a generalization of the second Chargaff rule, stating that the count of a string of k nucleotides on a single chromosomal strand equals the count of its inverse (reverse-complement) k-mer. It holds for many species, both eukaryotes and prokaryotes, for ranges of k which may vary from 7 to 10 as chromosomal lengths vary from 2Mbp to 200 Mbp. Building on this formalism we introduce the concept of k-mer distances between chromosomes. We formulate two k-mer distance measures, D(1) and D(2), which depend on k. D(1) takes into account all k-mers (for a single k) appearing on single strands of the two compared chromosomes, whereas D(2) takes into account both strands of each chromosome. Both measures reflect dissimilarities in global chromosomal structures. RESULTS: After defining the various distance measures and summarizing their properties, we also define proximities that rely on the existence of synteny blocks between chromosomes of different bacterial strains. Comparing pairs of strains of bacteria, we find negative correlations between synteny proximities and k-mer distances, thus establishing the meaning of the latter as measures of evolutionary distances among bacterial strains. The synteny measures we use are appropriate for closely related bacterial strains, where considerable sections of chromosomes demonstrate high direct or reversed equality. These measures are not appropriate for comparing different bacteria or eukaryotes. K-mer structural distances can be defined for all species. Because of the arbitrariness of strand choices, we employ only the D(2) measure when comparing chromosomes of different species. The results for comparisons of various eukaryotes display interesting behavior which is partially consistent with conventional understanding of evolutionary genomics. In particular, we define ratios of minimal k-mer distances (KDR) between unmasked and masked chromosomes of two species, which correlate with both short and long evolutionary scales. CONCLUSIONS: k-mer distances reflect dissimilarities among global chromosomal structures. They carry information which aggregates all mutations. As such they can complement traditional evolution studies , which mainly concentrate on coding regions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07952-0. BioMed Central 2021-09-06 /pmc/articles/PMC8422766/ /pubmed/34488632 http://dx.doi.org/10.1186/s12864-021-07952-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Kafri, Alon
Chor, Benny
Horn, David
Inter-chromosomal k-mer distances
title Inter-chromosomal k-mer distances
title_full Inter-chromosomal k-mer distances
title_fullStr Inter-chromosomal k-mer distances
title_full_unstemmed Inter-chromosomal k-mer distances
title_short Inter-chromosomal k-mer distances
title_sort inter-chromosomal k-mer distances
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8422766/
https://www.ncbi.nlm.nih.gov/pubmed/34488632
http://dx.doi.org/10.1186/s12864-021-07952-0
work_keys_str_mv AT kafrialon interchromosomalkmerdistances
AT chorbenny interchromosomalkmerdistances
AT horndavid interchromosomalkmerdistances