Cargando…

Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis

Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measu...

Descripción completa

Detalles Bibliográficos
Autores principales: Ré, Miguel A., Azad, Rajeev K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3984095/
https://www.ncbi.nlm.nih.gov/pubmed/24728338
http://dx.doi.org/10.1371/journal.pone.0093532
_version_ 1782311397440356352
author Ré, Miguel A.
Azad, Rajeev K.
author_facet Ré, Miguel A.
Azad, Rajeev K.
author_sort Ré, Miguel A.
collection PubMed
description Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.
format Online
Article
Text
id pubmed-3984095
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39840952014-04-15 Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis Ré, Miguel A. Azad, Rajeev K. PLoS One Research Article Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. Public Library of Science 2014-04-11 /pmc/articles/PMC3984095/ /pubmed/24728338 http://dx.doi.org/10.1371/journal.pone.0093532 Text en © 2014 Ré, Azad http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ré, Miguel A.
Azad, Rajeev K.
Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis
title Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis
title_full Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis
title_fullStr Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis
title_full_unstemmed Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis
title_short Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis
title_sort generalization of entropy based divergence measures for symbolic sequence analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3984095/
https://www.ncbi.nlm.nih.gov/pubmed/24728338
http://dx.doi.org/10.1371/journal.pone.0093532
work_keys_str_mv AT remiguela generalizationofentropybaseddivergencemeasuresforsymbolicsequenceanalysis
AT azadrajeevk generalizationofentropybaseddivergencemeasuresforsymbolicsequenceanalysis