Cargando…
MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics
BACKGROUND: Measuring sequence similarity is central for many problems in bioinformatics. In several contexts alignment-free techniques based on exact occurrences of substrings are faster, but also less accurate, than alignment-based approaches. Recently, several studies attempted to bridge the accu...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4839165/ https://www.ncbi.nlm.nih.gov/pubmed/27103940 http://dx.doi.org/10.1186/s13015-016-0072-x |
_version_ | 1782428106270703616 |
---|---|
author | Pizzi, Cinzia |
author_facet | Pizzi, Cinzia |
author_sort | Pizzi, Cinzia |
collection | PubMed |
description | BACKGROUND: Measuring sequence similarity is central for many problems in bioinformatics. In several contexts alignment-free techniques based on exact occurrences of substrings are faster, but also less accurate, than alignment-based approaches. Recently, several studies attempted to bridge the accuracy gap with the introduction of approximate matches in the definition of composition-based similarity measures. RESULTS: In this work we present MissMax, an exact algorithm for the computation of the longest common substring with mismatches between each suffix of a sequence x and a sequence y. This collection of statistics is useful for the computation of two similarity measures: the longest and the average common substring with k mismatches. As a further contribution we provide a “relaxed” version of MissMax that does not guarantee the exact solution, but it is faster in practice and still very precise. |
format | Online Article Text |
id | pubmed-4839165 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-48391652016-04-22 MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics Pizzi, Cinzia Algorithms Mol Biol Research BACKGROUND: Measuring sequence similarity is central for many problems in bioinformatics. In several contexts alignment-free techniques based on exact occurrences of substrings are faster, but also less accurate, than alignment-based approaches. Recently, several studies attempted to bridge the accuracy gap with the introduction of approximate matches in the definition of composition-based similarity measures. RESULTS: In this work we present MissMax, an exact algorithm for the computation of the longest common substring with mismatches between each suffix of a sequence x and a sequence y. This collection of statistics is useful for the computation of two similarity measures: the longest and the average common substring with k mismatches. As a further contribution we provide a “relaxed” version of MissMax that does not guarantee the exact solution, but it is faster in practice and still very precise. BioMed Central 2016-04-21 /pmc/articles/PMC4839165/ /pubmed/27103940 http://dx.doi.org/10.1186/s13015-016-0072-x Text en © Pizzi. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Pizzi, Cinzia MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics |
title | MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics |
title_full | MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics |
title_fullStr | MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics |
title_full_unstemmed | MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics |
title_short | MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics |
title_sort | missmax: alignment-free sequence comparison with mismatches through filtering and heuristics |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4839165/ https://www.ncbi.nlm.nih.gov/pubmed/27103940 http://dx.doi.org/10.1186/s13015-016-0072-x |
work_keys_str_mv | AT pizzicinzia missmaxalignmentfreesequencecomparisonwithmismatchesthroughfilteringandheuristics |