Cargando…
A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment
In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical i...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3327517/ https://www.ncbi.nlm.nih.gov/pubmed/22518086 http://dx.doi.org/10.4137/EBO.S9131 |
_version_ | 1782229666964176896 |
---|---|
author | Freschi, Valerio Bogliolo, Alessandro |
author_facet | Freschi, Valerio Bogliolo, Alessandro |
author_sort | Freschi, Valerio |
collection | PubMed |
description | In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical independence of adjacent residues. As a consequence, the presence of tandem repeats in sequences under comparison may impair the biological significance of the resulting alignment. Although solutions have been proposed, repeat-aware sequence alignment is still considered to be an open problem and new efficient and effective methods have been advocated. The present paper describes an alternative lossy compression scheme for genomic sequences which iteratively collapses repeats of increasing length. The resulting approximate representations do not contain tandem duplications, while retaining enough information for making their comparison even more significant than the edit distance between the original sequences. This allows us to exploit traditional alignment algorithms directly on the compressed sequences. Results confirm the validity of the proposed approach for the problem of duplication-aware sequence alignment. |
format | Online Article Text |
id | pubmed-3327517 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-33275172012-04-19 A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment Freschi, Valerio Bogliolo, Alessandro Evol Bioinform Online Original Research In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical independence of adjacent residues. As a consequence, the presence of tandem repeats in sequences under comparison may impair the biological significance of the resulting alignment. Although solutions have been proposed, repeat-aware sequence alignment is still considered to be an open problem and new efficient and effective methods have been advocated. The present paper describes an alternative lossy compression scheme for genomic sequences which iteratively collapses repeats of increasing length. The resulting approximate representations do not contain tandem duplications, while retaining enough information for making their comparison even more significant than the edit distance between the original sequences. This allows us to exploit traditional alignment algorithms directly on the compressed sequences. Results confirm the validity of the proposed approach for the problem of duplication-aware sequence alignment. Libertas Academica 2012-04-02 /pmc/articles/PMC3327517/ /pubmed/22518086 http://dx.doi.org/10.4137/EBO.S9131 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited. |
spellingShingle | Original Research Freschi, Valerio Bogliolo, Alessandro A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment |
title | A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment |
title_full | A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment |
title_fullStr | A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment |
title_full_unstemmed | A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment |
title_short | A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment |
title_sort | lossy compression technique enabling duplication-aware sequence alignment |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3327517/ https://www.ncbi.nlm.nih.gov/pubmed/22518086 http://dx.doi.org/10.4137/EBO.S9131 |
work_keys_str_mv | AT freschivalerio alossycompressiontechniqueenablingduplicationawaresequencealignment AT boglioloalessandro alossycompressiontechniqueenablingduplicationawaresequencealignment AT freschivalerio lossycompressiontechniqueenablingduplicationawaresequencealignment AT boglioloalessandro lossycompressiontechniqueenablingduplicationawaresequencealignment |