Cargando…

An Efficient Coding Technique for Stochastic Processes

In the framework of coding theory, under the assumption of a Markov process [Formula: see text] on a finite alphabet [Formula: see text] the compressed representation of the data will be composed of a description of the model used to code the data and the encoded data. Given the model, the Huffman’s...

Descripción completa

Detalles Bibliográficos
Autores principales: Garca, Jesús E., González-López, Verónica A., Tasca, Gustavo H., Yaginuma, Karina Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8775014/
https://www.ncbi.nlm.nih.gov/pubmed/35052091
http://dx.doi.org/10.3390/e24010065
_version_ 1784636479743459328
author Garca, Jesús E.
González-López, Verónica A.
Tasca, Gustavo H.
Yaginuma, Karina Y.
author_facet Garca, Jesús E.
González-López, Verónica A.
Tasca, Gustavo H.
Yaginuma, Karina Y.
author_sort Garca, Jesús E.
collection PubMed
description In the framework of coding theory, under the assumption of a Markov process [Formula: see text] on a finite alphabet [Formula: see text] the compressed representation of the data will be composed of a description of the model used to code the data and the encoded data. Given the model, the Huffman’s algorithm is optimal for the number of bits needed to encode the data. On the other hand, modeling [Formula: see text] through a Partition Markov Model (PMM) promotes a reduction in the number of transition probabilities needed to define the model. This paper shows how the use of Huffman code with a PMM reduces the number of bits needed in this process. We prove the estimation of a PMM allows for estimating the entropy of [Formula: see text] , providing an estimator of the minimum expected codeword length per symbol. We show the efficiency of the new methodology on a simulation study and, through a real problem of compression of DNA sequences of SARS-CoV-2, obtaining in the real data at least a reduction of 10.4%.
format Online
Article
Text
id pubmed-8775014
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87750142022-01-21 An Efficient Coding Technique for Stochastic Processes Garca, Jesús E. González-López, Verónica A. Tasca, Gustavo H. Yaginuma, Karina Y. Entropy (Basel) Article In the framework of coding theory, under the assumption of a Markov process [Formula: see text] on a finite alphabet [Formula: see text] the compressed representation of the data will be composed of a description of the model used to code the data and the encoded data. Given the model, the Huffman’s algorithm is optimal for the number of bits needed to encode the data. On the other hand, modeling [Formula: see text] through a Partition Markov Model (PMM) promotes a reduction in the number of transition probabilities needed to define the model. This paper shows how the use of Huffman code with a PMM reduces the number of bits needed in this process. We prove the estimation of a PMM allows for estimating the entropy of [Formula: see text] , providing an estimator of the minimum expected codeword length per symbol. We show the efficiency of the new methodology on a simulation study and, through a real problem of compression of DNA sequences of SARS-CoV-2, obtaining in the real data at least a reduction of 10.4%. MDPI 2021-12-30 /pmc/articles/PMC8775014/ /pubmed/35052091 http://dx.doi.org/10.3390/e24010065 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Garca, Jesús E.
González-López, Verónica A.
Tasca, Gustavo H.
Yaginuma, Karina Y.
An Efficient Coding Technique for Stochastic Processes
title An Efficient Coding Technique for Stochastic Processes
title_full An Efficient Coding Technique for Stochastic Processes
title_fullStr An Efficient Coding Technique for Stochastic Processes
title_full_unstemmed An Efficient Coding Technique for Stochastic Processes
title_short An Efficient Coding Technique for Stochastic Processes
title_sort efficient coding technique for stochastic processes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8775014/
https://www.ncbi.nlm.nih.gov/pubmed/35052091
http://dx.doi.org/10.3390/e24010065
work_keys_str_mv AT garcajesuse anefficientcodingtechniqueforstochasticprocesses
AT gonzalezlopezveronicaa anefficientcodingtechniqueforstochasticprocesses
AT tascagustavoh anefficientcodingtechniqueforstochasticprocesses
AT yaginumakarinay anefficientcodingtechniqueforstochasticprocesses
AT garcajesuse efficientcodingtechniqueforstochasticprocesses
AT gonzalezlopezveronicaa efficientcodingtechniqueforstochasticprocesses
AT tascagustavoh efficientcodingtechniqueforstochasticprocesses
AT yaginumakarinay efficientcodingtechniqueforstochasticprocesses