Cargando…

SlideSort: all pairs similarity search for short reads

Motivation: Recent progress in DNA sequencing technologies calls for fast and accurate algorithms that can evaluate sequence similarity for a huge amount of short reads. Searching similar pairs from a string pool is a fundamental process of de novo genome assembly, genome-wide alignment and other im...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shimizu, Kana, Tsuda, Koji
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2011
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3035798/ https://www.ncbi.nlm.nih.gov/pubmed/21148542 http://dx.doi.org/10.1093/bioinformatics/btq677

_version_	1782197815074619392
author	Shimizu, Kana Tsuda, Koji
author_facet	Shimizu, Kana Tsuda, Koji
author_sort	Shimizu, Kana
collection	PubMed
description	Motivation: Recent progress in DNA sequencing technologies calls for fast and accurate algorithms that can evaluate sequence similarity for a huge amount of short reads. Searching similar pairs from a string pool is a fundamental process of de novo genome assembly, genome-wide alignment and other important analyses. Results: In this study, we designed and implemented an exact algorithm SlideSort that finds all similar pairs from a string pool in terms of edit distance. Using an efficient pattern growth algorithm, SlideSort discovers chains of common k-mers to narrow down the search. Compared to existing methods based on single k-mers, our method is more effective in reducing the number of edit distance calculations. In comparison to backtracking methods such as BWA, our method is much faster in finding remote matches, scaling easily to tens of millions of sequences. Our software has an additional function of single link clustering, which is useful in summarizing short reads for further processing. Availability: Executable binary files and C++ libraries are available at http://www.cbrc.jp/~shimizu/slidesort/ for Linux and Windows. Contact: slidesort@m.aist.go.jp; shimizu-kana@aist.go.jp Supplementary information: Supplementary data are available at Bioinformatics online.
format	Text
id	pubmed-3035798
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-30357982011-02-09 SlideSort: all pairs similarity search for short reads Shimizu, Kana Tsuda, Koji Bioinformatics Original Papers Motivation: Recent progress in DNA sequencing technologies calls for fast and accurate algorithms that can evaluate sequence similarity for a huge amount of short reads. Searching similar pairs from a string pool is a fundamental process of de novo genome assembly, genome-wide alignment and other important analyses. Results: In this study, we designed and implemented an exact algorithm SlideSort that finds all similar pairs from a string pool in terms of edit distance. Using an efficient pattern growth algorithm, SlideSort discovers chains of common k-mers to narrow down the search. Compared to existing methods based on single k-mers, our method is more effective in reducing the number of edit distance calculations. In comparison to backtracking methods such as BWA, our method is much faster in finding remote matches, scaling easily to tens of millions of sequences. Our software has an additional function of single link clustering, which is useful in summarizing short reads for further processing. Availability: Executable binary files and C++ libraries are available at http://www.cbrc.jp/~shimizu/slidesort/ for Linux and Windows. Contact: slidesort@m.aist.go.jp; shimizu-kana@aist.go.jp Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2011-02-15 2010-12-09 /pmc/articles/PMC3035798/ /pubmed/21148542 http://dx.doi.org/10.1093/bioinformatics/btq677 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Shimizu, Kana Tsuda, Koji SlideSort: all pairs similarity search for short reads
title	SlideSort: all pairs similarity search for short reads
title_full	SlideSort: all pairs similarity search for short reads
title_fullStr	SlideSort: all pairs similarity search for short reads
title_full_unstemmed	SlideSort: all pairs similarity search for short reads
title_short	SlideSort: all pairs similarity search for short reads
title_sort	slidesort: all pairs similarity search for short reads
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3035798/ https://www.ncbi.nlm.nih.gov/pubmed/21148542 http://dx.doi.org/10.1093/bioinformatics/btq677
work_keys_str_mv	AT shimizukana slidesortallpairssimilaritysearchforshortreads AT tsudakoji slidesortallpairssimilaritysearchforshortreads

SlideSort: all pairs similarity search for short reads

Ejemplares similares