Cargando…

An Optimal Seed Based Compression Algorithm for DNA Sequences

This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary...

Descripción completa

Detalles Bibliográficos
Autores principales: Eric, Pamela Vinitha, Gopalakrishnan, Gopakumar, Karunakaran, Muralikrishnan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983397/
https://www.ncbi.nlm.nih.gov/pubmed/27555868
http://dx.doi.org/10.1155/2016/3528406
Descripción
Sumario:This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.