Cargando…
Asymptotic Analysis of the kth Subword Complexity
Patterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the kth Subword Complexity of t...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516637/ https://www.ncbi.nlm.nih.gov/pubmed/33285983 http://dx.doi.org/10.3390/e22020207 |
_version_ | 1783587046947815424 |
---|---|
author | Ahmadi, Lida Ward, Mark Daniel |
author_facet | Ahmadi, Lida Ward, Mark Daniel |
author_sort | Ahmadi, Lida |
collection | PubMed |
description | Patterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the kth Subword Complexity of the character string. By definition, the kth Subword Complexity is the number of distinct substrings of length k that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the kth Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns’ auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of [Formula: see text]. In the proof, we compare the distribution of the kth Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis. |
format | Online Article Text |
id | pubmed-7516637 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75166372020-11-09 Asymptotic Analysis of the kth Subword Complexity Ahmadi, Lida Ward, Mark Daniel Entropy (Basel) Article Patterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the kth Subword Complexity of the character string. By definition, the kth Subword Complexity is the number of distinct substrings of length k that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the kth Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns’ auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of [Formula: see text]. In the proof, we compare the distribution of the kth Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis. MDPI 2020-02-12 /pmc/articles/PMC7516637/ /pubmed/33285983 http://dx.doi.org/10.3390/e22020207 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Ahmadi, Lida Ward, Mark Daniel Asymptotic Analysis of the kth Subword Complexity |
title | Asymptotic Analysis of the kth Subword Complexity |
title_full | Asymptotic Analysis of the kth Subword Complexity |
title_fullStr | Asymptotic Analysis of the kth Subword Complexity |
title_full_unstemmed | Asymptotic Analysis of the kth Subword Complexity |
title_short | Asymptotic Analysis of the kth Subword Complexity |
title_sort | asymptotic analysis of the kth subword complexity |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516637/ https://www.ncbi.nlm.nih.gov/pubmed/33285983 http://dx.doi.org/10.3390/e22020207 |
work_keys_str_mv | AT ahmadilida asymptoticanalysisofthekthsubwordcomplexity AT wardmarkdaniel asymptoticanalysisofthekthsubwordcomplexity |