Cargando…

Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches

In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms...

Descripción completa

Detalles Bibliográficos
Autores principales:	Horwege, Sebastian, Lindner, Sebastian, Boden, Marcus, Hatje, Klas, Kollmar, Martin, Leimeister, Chris-André, Morgenstern, Burkhard
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2014
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086093/ https://www.ncbi.nlm.nih.gov/pubmed/24829447 http://dx.doi.org/10.1093/nar/gku398

_version_	1782324766492852224
author	Horwege, Sebastian Lindner, Sebastian Boden, Marcus Hatje, Klas Kollmar, Martin Leimeister, Chris-André Morgenstern, Burkhard
author_facet	Horwege, Sebastian Lindner, Sebastian Boden, Marcus Hatje, Klas Kollmar, Martin Leimeister, Chris-André Morgenstern, Burkhard
author_sort	Horwege, Sebastian
collection	PubMed
description	In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. The first of these approaches uses the relative frequencies of so-called spaced words in the input sequences, i.e. words containing ‘don't care’ or ‘wildcard’ symbols at certain pre-defined positions. Various distance measures can then be defined on sequences based on their different spaced-word composition. Our second approach defines the distance between two sequences by estimating for each position in the first sequence the length of the longest substring at this position that also occurs in the second sequence with up to k mismatches. Both approaches take a set of deoxyribonucleic acid (DNA) or protein sequences as input and return a matrix of pairwise distance values that can be used as a starting point for clustering algorithms or distance-based phylogeny reconstruction. The two alignment-free programmes are accessible through a web interface at ‘Göttingen Bioinformatics Compute Server (GOBICS)’: http://spaced.gobics.de http://kmacs.gobics.de and the source codes can be downloaded.
format	Online Article Text
id	pubmed-4086093
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-40860932014-10-28 Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches Horwege, Sebastian Lindner, Sebastian Boden, Marcus Hatje, Klas Kollmar, Martin Leimeister, Chris-André Morgenstern, Burkhard Nucleic Acids Res Article In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. The first of these approaches uses the relative frequencies of so-called spaced words in the input sequences, i.e. words containing ‘don't care’ or ‘wildcard’ symbols at certain pre-defined positions. Various distance measures can then be defined on sequences based on their different spaced-word composition. Our second approach defines the distance between two sequences by estimating for each position in the first sequence the length of the longest substring at this position that also occurs in the second sequence with up to k mismatches. Both approaches take a set of deoxyribonucleic acid (DNA) or protein sequences as input and return a matrix of pairwise distance values that can be used as a starting point for clustering algorithms or distance-based phylogeny reconstruction. The two alignment-free programmes are accessible through a web interface at ‘Göttingen Bioinformatics Compute Server (GOBICS)’: http://spaced.gobics.de http://kmacs.gobics.de and the source codes can be downloaded. Oxford University Press 2014-07-01 2014-05-14 /pmc/articles/PMC4086093/ /pubmed/24829447 http://dx.doi.org/10.1093/nar/gku398 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Article Horwege, Sebastian Lindner, Sebastian Boden, Marcus Hatje, Klas Kollmar, Martin Leimeister, Chris-André Morgenstern, Burkhard Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
title	Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
title_full	Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
title_fullStr	Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
title_full_unstemmed	Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
title_short	Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
title_sort	spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086093/ https://www.ncbi.nlm.nih.gov/pubmed/24829447 http://dx.doi.org/10.1093/nar/gku398
work_keys_str_mv	AT horwegesebastian spacedwordsandkmacsfastalignmentfreesequencecomparisonbasedoninexactwordmatches AT lindnersebastian spacedwordsandkmacsfastalignmentfreesequencecomparisonbasedoninexactwordmatches AT bodenmarcus spacedwordsandkmacsfastalignmentfreesequencecomparisonbasedoninexactwordmatches AT hatjeklas spacedwordsandkmacsfastalignmentfreesequencecomparisonbasedoninexactwordmatches AT kollmarmartin spacedwordsandkmacsfastalignmentfreesequencecomparisonbasedoninexactwordmatches AT leimeisterchrisandre spacedwordsandkmacsfastalignmentfreesequencecomparisonbasedoninexactwordmatches AT morgensternburkhard spacedwordsandkmacsfastalignmentfreesequencecomparisonbasedoninexactwordmatches

Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches

Ejemplares similares