Cargando…

GASSST: global alignment short sequence search tool

Motivation: The rapid development of next-generation sequencing technologies able to produce huge amounts of sequence data is leading to a wide range of new applications. This triggers the need for fast and accurate alignment software. Common techniques often restrict indels in the alignment to impr...

Descripción completa

Detalles Bibliográficos
Autores principales: Rizk, Guillaume, Lavenier, Dominique
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2951093/
https://www.ncbi.nlm.nih.gov/pubmed/20739310
http://dx.doi.org/10.1093/bioinformatics/btq485
_version_ 1782187685125816320
author Rizk, Guillaume
Lavenier, Dominique
author_facet Rizk, Guillaume
Lavenier, Dominique
author_sort Rizk, Guillaume
collection PubMed
description Motivation: The rapid development of next-generation sequencing technologies able to produce huge amounts of sequence data is leading to a wide range of new applications. This triggers the need for fast and accurate alignment software. Common techniques often restrict indels in the alignment to improve speed, whereas more flexible aligners are too slow for large-scale applications. Moreover, many current aligners are becoming inefficient as generated reads grow ever larger. Our goal with our new aligner GASSST (Global Alignment Short Sequence Search Tool) is thus 2-fold—achieving high performance with no restrictions on the number of indels with a design that is still effective on long reads. Results: We propose a new efficient filtering step that discards most alignments coming from the seed phase before they are checked by the costly dynamic programming algorithm. We use a carefully designed series of filters of increasing complexity and efficiency to quickly eliminate most candidate alignments in a wide range of configurations. The main filter uses a precomputed table containing the alignment score of short four base words aligned against each other. This table is reused several times by a new algorithm designed to approximate the score of the full dynamic programming algorithm. We compare the performance of GASSST against BWA, BFAST, SSAHA2 and PASS. We found that GASSST achieves high sensitivity in a wide range of configurations and faster overall execution time than other state-of-the-art aligners. Availability: GASSST is distributed under the CeCILL software license at http://www.irisa.fr/symbiose/projects/gassst/ Contact: guillaume.rizk@irisa.fr; dominique.lavenier@irisa.fr Supplementary information: Supplementary data are available at Bioinformatics online.
format Text
id pubmed-2951093
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29510932010-10-08 GASSST: global alignment short sequence search tool Rizk, Guillaume Lavenier, Dominique Bioinformatics Original Paper Motivation: The rapid development of next-generation sequencing technologies able to produce huge amounts of sequence data is leading to a wide range of new applications. This triggers the need for fast and accurate alignment software. Common techniques often restrict indels in the alignment to improve speed, whereas more flexible aligners are too slow for large-scale applications. Moreover, many current aligners are becoming inefficient as generated reads grow ever larger. Our goal with our new aligner GASSST (Global Alignment Short Sequence Search Tool) is thus 2-fold—achieving high performance with no restrictions on the number of indels with a design that is still effective on long reads. Results: We propose a new efficient filtering step that discards most alignments coming from the seed phase before they are checked by the costly dynamic programming algorithm. We use a carefully designed series of filters of increasing complexity and efficiency to quickly eliminate most candidate alignments in a wide range of configurations. The main filter uses a precomputed table containing the alignment score of short four base words aligned against each other. This table is reused several times by a new algorithm designed to approximate the score of the full dynamic programming algorithm. We compare the performance of GASSST against BWA, BFAST, SSAHA2 and PASS. We found that GASSST achieves high sensitivity in a wide range of configurations and faster overall execution time than other state-of-the-art aligners. Availability: GASSST is distributed under the CeCILL software license at http://www.irisa.fr/symbiose/projects/gassst/ Contact: guillaume.rizk@irisa.fr; dominique.lavenier@irisa.fr Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2010-10-15 2010-08-24 /pmc/articles/PMC2951093/ /pubmed/20739310 http://dx.doi.org/10.1093/bioinformatics/btq485 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Rizk, Guillaume
Lavenier, Dominique
GASSST: global alignment short sequence search tool
title GASSST: global alignment short sequence search tool
title_full GASSST: global alignment short sequence search tool
title_fullStr GASSST: global alignment short sequence search tool
title_full_unstemmed GASSST: global alignment short sequence search tool
title_short GASSST: global alignment short sequence search tool
title_sort gassst: global alignment short sequence search tool
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2951093/
https://www.ncbi.nlm.nih.gov/pubmed/20739310
http://dx.doi.org/10.1093/bioinformatics/btq485
work_keys_str_mv AT rizkguillaume gassstglobalalignmentshortsequencesearchtool
AT lavenierdominique gassstglobalalignmentshortsequencesearchtool