Cargando…

RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure

BACKGROUND: With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Qi, Yang, Yu, Chen, Chun, Bu, Jiajun, Zhang, Yin, Ye, Xiuzi
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2335284/
https://www.ncbi.nlm.nih.gov/pubmed/18373878
http://dx.doi.org/10.1186/1471-2105-9-176
_version_ 1782152816349937664
author Liu, Qi
Yang, Yu
Chen, Chun
Bu, Jiajun
Zhang, Yin
Ye, Xiuzi
author_facet Liu, Qi
Yang, Yu
Chen, Chun
Bu, Jiajun
Zhang, Yin
Ye, Xiuzi
author_sort Liu, Qi
collection PubMed
description BACKGROUND: With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed, none of them are suitable for the compression of RNA sequences with their secondary structures simultaneously. This kind of compression not only facilitates the maintenance of RNA data, but also supplies a novel way to measure the informational complexity of RNA structural data, raising the possibility of studying the relationship between the functional activities of RNA structures and their complexities, as well as various structural properties of RNA based on compression. RESULTS: RNACompress employs an efficient grammar-based model to compress RNA sequences and their secondary structures. The main goals of this algorithm are two fold: (1) present a robust and effective way for RNA structural data compression; (2) design a suitable model to represent RNA secondary structure as well as derive the informational complexity of the structural data based on compression. Our extensive tests have shown that RNACompress achieves a universally better compression ratio compared with other sequence-specific or common text-specific compression algorithms, such as Gencompress, winrar and gzip. Moreover, a test of the activities of distinct GTP-binding RNAs (aptamers) compared with their structural complexity shows that our defined informational complexity can be used to describe how complexity varies with activity. These results lead to an objective means of comparing the functional properties of heteropolymers from the information perspective. CONCLUSION: A universal algorithm for the compression of RNA secondary structure as well as the evaluation of its informational complexity is discussed in this paper. We have developed RNACompress, as a useful tool for academic users. Extensive tests have shown that RNACompress is a universally efficient algorithm for the compression of RNA sequences with their secondary structures. RNACompress also serves as a good measurement of the informational complexity of RNA secondary structure, which can be used to study the functional activities of RNA molecules.
format Text
id pubmed-2335284
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23352842008-04-28 RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure Liu, Qi Yang, Yu Chen, Chun Bu, Jiajun Zhang, Yin Ye, Xiuzi BMC Bioinformatics Methodology Article BACKGROUND: With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed, none of them are suitable for the compression of RNA sequences with their secondary structures simultaneously. This kind of compression not only facilitates the maintenance of RNA data, but also supplies a novel way to measure the informational complexity of RNA structural data, raising the possibility of studying the relationship between the functional activities of RNA structures and their complexities, as well as various structural properties of RNA based on compression. RESULTS: RNACompress employs an efficient grammar-based model to compress RNA sequences and their secondary structures. The main goals of this algorithm are two fold: (1) present a robust and effective way for RNA structural data compression; (2) design a suitable model to represent RNA secondary structure as well as derive the informational complexity of the structural data based on compression. Our extensive tests have shown that RNACompress achieves a universally better compression ratio compared with other sequence-specific or common text-specific compression algorithms, such as Gencompress, winrar and gzip. Moreover, a test of the activities of distinct GTP-binding RNAs (aptamers) compared with their structural complexity shows that our defined informational complexity can be used to describe how complexity varies with activity. These results lead to an objective means of comparing the functional properties of heteropolymers from the information perspective. CONCLUSION: A universal algorithm for the compression of RNA secondary structure as well as the evaluation of its informational complexity is discussed in this paper. We have developed RNACompress, as a useful tool for academic users. Extensive tests have shown that RNACompress is a universally efficient algorithm for the compression of RNA sequences with their secondary structures. RNACompress also serves as a good measurement of the informational complexity of RNA secondary structure, which can be used to study the functional activities of RNA molecules. BioMed Central 2008-03-31 /pmc/articles/PMC2335284/ /pubmed/18373878 http://dx.doi.org/10.1186/1471-2105-9-176 Text en Copyright © 2008 Liu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Liu, Qi
Yang, Yu
Chen, Chun
Bu, Jiajun
Zhang, Yin
Ye, Xiuzi
RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure
title RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure
title_full RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure
title_fullStr RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure
title_full_unstemmed RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure
title_short RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure
title_sort rnacompress: grammar-based compression and informational complexity measurement of rna secondary structure
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2335284/
https://www.ncbi.nlm.nih.gov/pubmed/18373878
http://dx.doi.org/10.1186/1471-2105-9-176
work_keys_str_mv AT liuqi rnacompressgrammarbasedcompressionandinformationalcomplexitymeasurementofrnasecondarystructure
AT yangyu rnacompressgrammarbasedcompressionandinformationalcomplexitymeasurementofrnasecondarystructure
AT chenchun rnacompressgrammarbasedcompressionandinformationalcomplexitymeasurementofrnasecondarystructure
AT bujiajun rnacompressgrammarbasedcompressionandinformationalcomplexitymeasurementofrnasecondarystructure
AT zhangyin rnacompressgrammarbasedcompressionandinformationalcomplexitymeasurementofrnasecondarystructure
AT yexiuzi rnacompressgrammarbasedcompressionandinformationalcomplexitymeasurementofrnasecondarystructure