Cargando…

Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points

MOTIVATION: Most methods for pairwise and multiple genome alignment use fast local homology search tools to identify anchor points, i.e. high-scoring local alignments of the input sequences. Sequence segments between those anchor points are then aligned with slower, more sensitive methods. Finding s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Leimeister, Chris-André, Dencker, Thomas, Morgenstern, Burkhard
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6330006/ https://www.ncbi.nlm.nih.gov/pubmed/29992260 http://dx.doi.org/10.1093/bioinformatics/bty592

_version_	1783386910228480000
author	Leimeister, Chris-André Dencker, Thomas Morgenstern, Burkhard
author_facet	Leimeister, Chris-André Dencker, Thomas Morgenstern, Burkhard
author_sort	Leimeister, Chris-André
collection	PubMed
description	MOTIVATION: Most methods for pairwise and multiple genome alignment use fast local homology search tools to identify anchor points, i.e. high-scoring local alignments of the input sequences. Sequence segments between those anchor points are then aligned with slower, more sensitive methods. Finding suitable anchor points is therefore crucial for genome sequence comparison; speed and sensitivity of genome alignment depend on the underlying anchoring methods. RESULTS: In this article, we use filtered spaced word matches to generate anchor points for genome alignment. For a given binary pattern representing match and don’t-care positions, we first search for spaced-word matches, i.e. ungapped local pairwise alignments with matching nucleotides at the match positions of the pattern and possible mismatches at the don’t-care positions. Those spaced-word matches that have similarity scores above some threshold value are then extended using a standard X-drop algorithm; the resulting local alignments are used as anchor points. To evaluate this approach, we used the popular multiple-genome-alignment pipeline Mugsy and replaced the exact word matches that Mugsy uses as anchor points with our spaced-word-based anchor points. For closely related genome sequences, the two anchoring procedures lead to multiple alignments of similar quality. For distantly related genomes, however, alignments calculated with our filtered-spaced-word matches are superior to alignments produced with the original Mugsy program where exact word matches are used to find anchor points. AVAILABILITY AND IMPLEMENTATION: http://spacedanchor.gobics.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-6330006
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-63300062019-01-15 Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points Leimeister, Chris-André Dencker, Thomas Morgenstern, Burkhard Bioinformatics Original Papers MOTIVATION: Most methods for pairwise and multiple genome alignment use fast local homology search tools to identify anchor points, i.e. high-scoring local alignments of the input sequences. Sequence segments between those anchor points are then aligned with slower, more sensitive methods. Finding suitable anchor points is therefore crucial for genome sequence comparison; speed and sensitivity of genome alignment depend on the underlying anchoring methods. RESULTS: In this article, we use filtered spaced word matches to generate anchor points for genome alignment. For a given binary pattern representing match and don’t-care positions, we first search for spaced-word matches, i.e. ungapped local pairwise alignments with matching nucleotides at the match positions of the pattern and possible mismatches at the don’t-care positions. Those spaced-word matches that have similarity scores above some threshold value are then extended using a standard X-drop algorithm; the resulting local alignments are used as anchor points. To evaluate this approach, we used the popular multiple-genome-alignment pipeline Mugsy and replaced the exact word matches that Mugsy uses as anchor points with our spaced-word-based anchor points. For closely related genome sequences, the two anchoring procedures lead to multiple alignments of similar quality. For distantly related genomes, however, alignments calculated with our filtered-spaced-word matches are superior to alignments produced with the original Mugsy program where exact word matches are used to find anchor points. AVAILABILITY AND IMPLEMENTATION: http://spacedanchor.gobics.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-01-15 2018-07-10 /pmc/articles/PMC6330006/ /pubmed/29992260 http://dx.doi.org/10.1093/bioinformatics/bty592 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Original Papers Leimeister, Chris-André Dencker, Thomas Morgenstern, Burkhard Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points
title	Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points
title_full	Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points
title_fullStr	Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points
title_full_unstemmed	Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points
title_short	Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points
title_sort	accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6330006/ https://www.ncbi.nlm.nih.gov/pubmed/29992260 http://dx.doi.org/10.1093/bioinformatics/bty592
work_keys_str_mv	AT leimeisterchrisandre accuratemultiplealignmentofdistantlyrelatedgenomesequencesusingfilteredspacedwordmatchesasanchorpoints AT denckerthomas accuratemultiplealignmentofdistantlyrelatedgenomesequencesusingfilteredspacedwordmatchesasanchorpoints AT morgensternburkhard accuratemultiplealignmentofdistantlyrelatedgenomesequencesusingfilteredspacedwordmatchesasanchorpoints

Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points

Ejemplares similares