Cargando…
An Efficient Coding Technique for Stochastic Processes
In the framework of coding theory, under the assumption of a Markov process [Formula: see text] on a finite alphabet [Formula: see text] the compressed representation of the data will be composed of a description of the model used to code the data and the encoded data. Given the model, the Huffman’s...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8775014/ https://www.ncbi.nlm.nih.gov/pubmed/35052091 http://dx.doi.org/10.3390/e24010065 |
Sumario: | In the framework of coding theory, under the assumption of a Markov process [Formula: see text] on a finite alphabet [Formula: see text] the compressed representation of the data will be composed of a description of the model used to code the data and the encoded data. Given the model, the Huffman’s algorithm is optimal for the number of bits needed to encode the data. On the other hand, modeling [Formula: see text] through a Partition Markov Model (PMM) promotes a reduction in the number of transition probabilities needed to define the model. This paper shows how the use of Huffman code with a PMM reduces the number of bits needed in this process. We prove the estimation of a PMM allows for estimating the entropy of [Formula: see text] , providing an estimator of the minimum expected codeword length per symbol. We show the efficiency of the new methodology on a simulation study and, through a real problem of compression of DNA sequences of SARS-CoV-2, obtaining in the real data at least a reduction of 10.4%. |
---|