Cargando…

Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes

An efficient DNA compressor furnishes an approximation to measure and compare information quantities present in, between and across DNA sequences, regardless of the characteristics of the sources. In this paper, we compare directly two information measures, the Normalized Compression Distance (NCD)...

Descripción completa

Detalles Bibliográficos
Autores principales: Pratas, Diogo, Silva, Raquel M., Pinho, Armando J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512912/
https://www.ncbi.nlm.nih.gov/pubmed/33265483
http://dx.doi.org/10.3390/e20060393
_version_ 1783586266748551168
author Pratas, Diogo
Silva, Raquel M.
Pinho, Armando J.
author_facet Pratas, Diogo
Silva, Raquel M.
Pinho, Armando J.
author_sort Pratas, Diogo
collection PubMed
description An efficient DNA compressor furnishes an approximation to measure and compare information quantities present in, between and across DNA sequences, regardless of the characteristics of the sources. In this paper, we compare directly two information measures, the Normalized Compression Distance (NCD) and the Normalized Relative Compression (NRC). These measures answer different questions; the NCD measures how similar both strings are (in terms of information content) and the NRC (which, in general, is nonsymmetric) indicates the fraction of one of them that cannot be constructed using information from the other one. This leads to the problem of finding out which measure (or question) is more suitable for the answer we need. For computing both, we use a state of the art DNA sequence compressor that we benchmark with some top compressors in different compression modes. Then, we apply the compressor on DNA sequences with different scales and natures, first using synthetic sequences and then on real DNA sequences. The last include mitochondrial DNA (mtDNA), messenger RNA (mRNA) and genomic DNA (gDNA) of seven primates. We provide several insights into evolutionary acceleration rates at different scales, namely, the observation and confirmation across the whole genomes of a higher variation rate of the mtDNA relative to the gDNA. We also show the importance of relative compression for localizing similar information regions using mtDNA.
format Online
Article
Text
id pubmed-7512912
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75129122020-11-09 Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes Pratas, Diogo Silva, Raquel M. Pinho, Armando J. Entropy (Basel) Article An efficient DNA compressor furnishes an approximation to measure and compare information quantities present in, between and across DNA sequences, regardless of the characteristics of the sources. In this paper, we compare directly two information measures, the Normalized Compression Distance (NCD) and the Normalized Relative Compression (NRC). These measures answer different questions; the NCD measures how similar both strings are (in terms of information content) and the NRC (which, in general, is nonsymmetric) indicates the fraction of one of them that cannot be constructed using information from the other one. This leads to the problem of finding out which measure (or question) is more suitable for the answer we need. For computing both, we use a state of the art DNA sequence compressor that we benchmark with some top compressors in different compression modes. Then, we apply the compressor on DNA sequences with different scales and natures, first using synthetic sequences and then on real DNA sequences. The last include mitochondrial DNA (mtDNA), messenger RNA (mRNA) and genomic DNA (gDNA) of seven primates. We provide several insights into evolutionary acceleration rates at different scales, namely, the observation and confirmation across the whole genomes of a higher variation rate of the mtDNA relative to the gDNA. We also show the importance of relative compression for localizing similar information regions using mtDNA. MDPI 2018-05-23 /pmc/articles/PMC7512912/ /pubmed/33265483 http://dx.doi.org/10.3390/e20060393 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Pratas, Diogo
Silva, Raquel M.
Pinho, Armando J.
Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes
title Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes
title_full Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes
title_fullStr Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes
title_full_unstemmed Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes
title_short Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes
title_sort comparison of compression-based measures with application to the evolution of primate genomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512912/
https://www.ncbi.nlm.nih.gov/pubmed/33265483
http://dx.doi.org/10.3390/e20060393
work_keys_str_mv AT pratasdiogo comparisonofcompressionbasedmeasureswithapplicationtotheevolutionofprimategenomes
AT silvaraquelm comparisonofcompressionbasedmeasureswithapplicationtotheevolutionofprimategenomes
AT pinhoarmandoj comparisonofcompressionbasedmeasureswithapplicationtotheevolutionofprimategenomes