Cargando…

A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment

In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical i...

Descripción completa

Detalles Bibliográficos
Autores principales: Freschi, Valerio, Bogliolo, Alessandro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3327517/
https://www.ncbi.nlm.nih.gov/pubmed/22518086
http://dx.doi.org/10.4137/EBO.S9131
_version_ 1782229666964176896
author Freschi, Valerio
Bogliolo, Alessandro
author_facet Freschi, Valerio
Bogliolo, Alessandro
author_sort Freschi, Valerio
collection PubMed
description In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical independence of adjacent residues. As a consequence, the presence of tandem repeats in sequences under comparison may impair the biological significance of the resulting alignment. Although solutions have been proposed, repeat-aware sequence alignment is still considered to be an open problem and new efficient and effective methods have been advocated. The present paper describes an alternative lossy compression scheme for genomic sequences which iteratively collapses repeats of increasing length. The resulting approximate representations do not contain tandem duplications, while retaining enough information for making their comparison even more significant than the edit distance between the original sequences. This allows us to exploit traditional alignment algorithms directly on the compressed sequences. Results confirm the validity of the proposed approach for the problem of duplication-aware sequence alignment.
format Online
Article
Text
id pubmed-3327517
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-33275172012-04-19 A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment Freschi, Valerio Bogliolo, Alessandro Evol Bioinform Online Original Research In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical independence of adjacent residues. As a consequence, the presence of tandem repeats in sequences under comparison may impair the biological significance of the resulting alignment. Although solutions have been proposed, repeat-aware sequence alignment is still considered to be an open problem and new efficient and effective methods have been advocated. The present paper describes an alternative lossy compression scheme for genomic sequences which iteratively collapses repeats of increasing length. The resulting approximate representations do not contain tandem duplications, while retaining enough information for making their comparison even more significant than the edit distance between the original sequences. This allows us to exploit traditional alignment algorithms directly on the compressed sequences. Results confirm the validity of the proposed approach for the problem of duplication-aware sequence alignment. Libertas Academica 2012-04-02 /pmc/articles/PMC3327517/ /pubmed/22518086 http://dx.doi.org/10.4137/EBO.S9131 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
spellingShingle Original Research
Freschi, Valerio
Bogliolo, Alessandro
A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment
title A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment
title_full A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment
title_fullStr A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment
title_full_unstemmed A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment
title_short A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment
title_sort lossy compression technique enabling duplication-aware sequence alignment
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3327517/
https://www.ncbi.nlm.nih.gov/pubmed/22518086
http://dx.doi.org/10.4137/EBO.S9131
work_keys_str_mv AT freschivalerio alossycompressiontechniqueenablingduplicationawaresequencealignment
AT boglioloalessandro alossycompressiontechniqueenablingduplicationawaresequencealignment
AT freschivalerio lossycompressiontechniqueenablingduplicationawaresequencealignment
AT boglioloalessandro lossycompressiontechniqueenablingduplicationawaresequencealignment