Cargando…
Inter-chromosomal k-mer distances
BACKGROUND: Inversion Symmetry is a generalization of the second Chargaff rule, stating that the count of a string of k nucleotides on a single chromosomal strand equals the count of its inverse (reverse-complement) k-mer. It holds for many species, both eukaryotes and prokaryotes, for ranges of k w...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8422766/ https://www.ncbi.nlm.nih.gov/pubmed/34488632 http://dx.doi.org/10.1186/s12864-021-07952-0 |
_version_ | 1783749342540070912 |
---|---|
author | Kafri, Alon Chor, Benny Horn, David |
author_facet | Kafri, Alon Chor, Benny Horn, David |
author_sort | Kafri, Alon |
collection | PubMed |
description | BACKGROUND: Inversion Symmetry is a generalization of the second Chargaff rule, stating that the count of a string of k nucleotides on a single chromosomal strand equals the count of its inverse (reverse-complement) k-mer. It holds for many species, both eukaryotes and prokaryotes, for ranges of k which may vary from 7 to 10 as chromosomal lengths vary from 2Mbp to 200 Mbp. Building on this formalism we introduce the concept of k-mer distances between chromosomes. We formulate two k-mer distance measures, D(1) and D(2), which depend on k. D(1) takes into account all k-mers (for a single k) appearing on single strands of the two compared chromosomes, whereas D(2) takes into account both strands of each chromosome. Both measures reflect dissimilarities in global chromosomal structures. RESULTS: After defining the various distance measures and summarizing their properties, we also define proximities that rely on the existence of synteny blocks between chromosomes of different bacterial strains. Comparing pairs of strains of bacteria, we find negative correlations between synteny proximities and k-mer distances, thus establishing the meaning of the latter as measures of evolutionary distances among bacterial strains. The synteny measures we use are appropriate for closely related bacterial strains, where considerable sections of chromosomes demonstrate high direct or reversed equality. These measures are not appropriate for comparing different bacteria or eukaryotes. K-mer structural distances can be defined for all species. Because of the arbitrariness of strand choices, we employ only the D(2) measure when comparing chromosomes of different species. The results for comparisons of various eukaryotes display interesting behavior which is partially consistent with conventional understanding of evolutionary genomics. In particular, we define ratios of minimal k-mer distances (KDR) between unmasked and masked chromosomes of two species, which correlate with both short and long evolutionary scales. CONCLUSIONS: k-mer distances reflect dissimilarities among global chromosomal structures. They carry information which aggregates all mutations. As such they can complement traditional evolution studies , which mainly concentrate on coding regions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07952-0. |
format | Online Article Text |
id | pubmed-8422766 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-84227662021-09-09 Inter-chromosomal k-mer distances Kafri, Alon Chor, Benny Horn, David BMC Genomics Research BACKGROUND: Inversion Symmetry is a generalization of the second Chargaff rule, stating that the count of a string of k nucleotides on a single chromosomal strand equals the count of its inverse (reverse-complement) k-mer. It holds for many species, both eukaryotes and prokaryotes, for ranges of k which may vary from 7 to 10 as chromosomal lengths vary from 2Mbp to 200 Mbp. Building on this formalism we introduce the concept of k-mer distances between chromosomes. We formulate two k-mer distance measures, D(1) and D(2), which depend on k. D(1) takes into account all k-mers (for a single k) appearing on single strands of the two compared chromosomes, whereas D(2) takes into account both strands of each chromosome. Both measures reflect dissimilarities in global chromosomal structures. RESULTS: After defining the various distance measures and summarizing their properties, we also define proximities that rely on the existence of synteny blocks between chromosomes of different bacterial strains. Comparing pairs of strains of bacteria, we find negative correlations between synteny proximities and k-mer distances, thus establishing the meaning of the latter as measures of evolutionary distances among bacterial strains. The synteny measures we use are appropriate for closely related bacterial strains, where considerable sections of chromosomes demonstrate high direct or reversed equality. These measures are not appropriate for comparing different bacteria or eukaryotes. K-mer structural distances can be defined for all species. Because of the arbitrariness of strand choices, we employ only the D(2) measure when comparing chromosomes of different species. The results for comparisons of various eukaryotes display interesting behavior which is partially consistent with conventional understanding of evolutionary genomics. In particular, we define ratios of minimal k-mer distances (KDR) between unmasked and masked chromosomes of two species, which correlate with both short and long evolutionary scales. CONCLUSIONS: k-mer distances reflect dissimilarities among global chromosomal structures. They carry information which aggregates all mutations. As such they can complement traditional evolution studies , which mainly concentrate on coding regions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07952-0. BioMed Central 2021-09-06 /pmc/articles/PMC8422766/ /pubmed/34488632 http://dx.doi.org/10.1186/s12864-021-07952-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Kafri, Alon Chor, Benny Horn, David Inter-chromosomal k-mer distances |
title | Inter-chromosomal k-mer distances |
title_full | Inter-chromosomal k-mer distances |
title_fullStr | Inter-chromosomal k-mer distances |
title_full_unstemmed | Inter-chromosomal k-mer distances |
title_short | Inter-chromosomal k-mer distances |
title_sort | inter-chromosomal k-mer distances |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8422766/ https://www.ncbi.nlm.nih.gov/pubmed/34488632 http://dx.doi.org/10.1186/s12864-021-07952-0 |
work_keys_str_mv | AT kafrialon interchromosomalkmerdistances AT chorbenny interchromosomalkmerdistances AT horndavid interchromosomalkmerdistances |