Cargando…

An Optimal Seed Based Compression Algorithm for DNA Sequences

This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary...

Descripción completa

Detalles Bibliográficos
Autores principales: Eric, Pamela Vinitha, Gopalakrishnan, Gopakumar, Karunakaran, Muralikrishnan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983397/
https://www.ncbi.nlm.nih.gov/pubmed/27555868
http://dx.doi.org/10.1155/2016/3528406
_version_ 1782447904434159616
author Eric, Pamela Vinitha
Gopalakrishnan, Gopakumar
Karunakaran, Muralikrishnan
author_facet Eric, Pamela Vinitha
Gopalakrishnan, Gopakumar
Karunakaran, Muralikrishnan
author_sort Eric, Pamela Vinitha
collection PubMed
description This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.
format Online
Article
Text
id pubmed-4983397
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-49833972016-08-23 An Optimal Seed Based Compression Algorithm for DNA Sequences Eric, Pamela Vinitha Gopalakrishnan, Gopakumar Karunakaran, Muralikrishnan Adv Bioinformatics Research Article This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms. Hindawi Publishing Corporation 2016 2016-07-31 /pmc/articles/PMC4983397/ /pubmed/27555868 http://dx.doi.org/10.1155/2016/3528406 Text en Copyright © 2016 Pamela Vinitha Eric et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Eric, Pamela Vinitha
Gopalakrishnan, Gopakumar
Karunakaran, Muralikrishnan
An Optimal Seed Based Compression Algorithm for DNA Sequences
title An Optimal Seed Based Compression Algorithm for DNA Sequences
title_full An Optimal Seed Based Compression Algorithm for DNA Sequences
title_fullStr An Optimal Seed Based Compression Algorithm for DNA Sequences
title_full_unstemmed An Optimal Seed Based Compression Algorithm for DNA Sequences
title_short An Optimal Seed Based Compression Algorithm for DNA Sequences
title_sort optimal seed based compression algorithm for dna sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983397/
https://www.ncbi.nlm.nih.gov/pubmed/27555868
http://dx.doi.org/10.1155/2016/3528406
work_keys_str_mv AT ericpamelavinitha anoptimalseedbasedcompressionalgorithmfordnasequences
AT gopalakrishnangopakumar anoptimalseedbasedcompressionalgorithmfordnasequences
AT karunakaranmuralikrishnan anoptimalseedbasedcompressionalgorithmfordnasequences
AT ericpamelavinitha optimalseedbasedcompressionalgorithmfordnasequences
AT gopalakrishnangopakumar optimalseedbasedcompressionalgorithmfordnasequences
AT karunakaranmuralikrishnan optimalseedbasedcompressionalgorithmfordnasequences