Cargando…
Cross chromosomal similarity for DNA sequence compression
Current DNA compression algorithms work by finding similar repeated regions within the DNA sequence and then encoding these regions together to achieve compression. Our study on chromosome sequence similarity reveals that the length of similar repeated regions within one chromosome is about 4.5% of...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Biomedical Informatics Publishing Group
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2533061/ https://www.ncbi.nlm.nih.gov/pubmed/18795115 |
_version_ | 1782159016358576128 |
---|---|
author | Wu, Choi-Ping Paula Law, Ngai-Fong Siu, Wan-Chi |
author_facet | Wu, Choi-Ping Paula Law, Ngai-Fong Siu, Wan-Chi |
author_sort | Wu, Choi-Ping Paula |
collection | PubMed |
description | Current DNA compression algorithms work by finding similar repeated regions within the DNA sequence and then encoding these regions together to achieve compression. Our study on chromosome sequence similarity reveals that the length of similar repeated regions within one chromosome is about 4.5% of the total sequence length. The compression gain is often not high because of these short lengths. It is well known that similarity exist among different regions of chromosome sequences. This implies that similar repeated sequences are found among different regions of chromosome sequences. Here, we study cross-chromosomal similarity for DNA sequence compression. The length and location of similar repeated regions among the sixteen chromosomes of S. cerevisiae are studied. It is found that the average percentage of similar subsequences found between two chromosome sequences is about 10% in which 8% comes from cross-chromosomal prediction and 2% from self-chromosomal prediction. The percentage of similar subsquences is about 18% in which only 1.2% comes from self-chromosomal prediction while the rest is from cross-chromosomal prediction among the 16 chromosomes studied. This suggests the importance of cross-chromosomal similarities in addition to self-chromosomal similarities in DNA sequence compression. An additional 23% of storage space could be reduced on average using self-chromosomal and cross-chromosomal predictions in compressing the 16 chromosomes of S. cerevisiae. |
format | Text |
id | pubmed-2533061 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | Biomedical Informatics Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-25330612008-09-15 Cross chromosomal similarity for DNA sequence compression Wu, Choi-Ping Paula Law, Ngai-Fong Siu, Wan-Chi Bioinformation Hypothesis Current DNA compression algorithms work by finding similar repeated regions within the DNA sequence and then encoding these regions together to achieve compression. Our study on chromosome sequence similarity reveals that the length of similar repeated regions within one chromosome is about 4.5% of the total sequence length. The compression gain is often not high because of these short lengths. It is well known that similarity exist among different regions of chromosome sequences. This implies that similar repeated sequences are found among different regions of chromosome sequences. Here, we study cross-chromosomal similarity for DNA sequence compression. The length and location of similar repeated regions among the sixteen chromosomes of S. cerevisiae are studied. It is found that the average percentage of similar subsequences found between two chromosome sequences is about 10% in which 8% comes from cross-chromosomal prediction and 2% from self-chromosomal prediction. The percentage of similar subsquences is about 18% in which only 1.2% comes from self-chromosomal prediction while the rest is from cross-chromosomal prediction among the 16 chromosomes studied. This suggests the importance of cross-chromosomal similarities in addition to self-chromosomal similarities in DNA sequence compression. An additional 23% of storage space could be reduced on average using self-chromosomal and cross-chromosomal predictions in compressing the 16 chromosomes of S. cerevisiae. Biomedical Informatics Publishing Group 2008-07-14 /pmc/articles/PMC2533061/ /pubmed/18795115 Text en © 2008 Biomedical Informatics Publishing Group This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited. |
spellingShingle | Hypothesis Wu, Choi-Ping Paula Law, Ngai-Fong Siu, Wan-Chi Cross chromosomal similarity for DNA sequence compression |
title | Cross chromosomal similarity for DNA sequence compression |
title_full | Cross chromosomal similarity for DNA sequence compression |
title_fullStr | Cross chromosomal similarity for DNA sequence compression |
title_full_unstemmed | Cross chromosomal similarity for DNA sequence compression |
title_short | Cross chromosomal similarity for DNA sequence compression |
title_sort | cross chromosomal similarity for dna sequence compression |
topic | Hypothesis |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2533061/ https://www.ncbi.nlm.nih.gov/pubmed/18795115 |
work_keys_str_mv | AT wuchoipingpaula crosschromosomalsimilarityfordnasequencecompression AT lawngaifong crosschromosomalsimilarityfordnasequencecompression AT siuwanchi crosschromosomalsimilarityfordnasequencecompression |