Cargando…

Fast inexact mapping using advanced tree exploration on backward search methods

BACKGROUND: Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferra...

Descripción completa

Detalles Bibliográficos
Autores principales: Salavert, José, Tomás, Andrés, Tárraga, Joaquín, Medina, Ignacio, Dopazo, Joaquín, Blanquer, Ignacio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384339/
https://www.ncbi.nlm.nih.gov/pubmed/25626517
http://dx.doi.org/10.1186/s12859-014-0438-3
_version_ 1782364888443650048
author Salavert, José
Tomás, Andrés
Tárraga, Joaquín
Medina, Ignacio
Dopazo, Joaquín
Blanquer, Ignacio
author_facet Salavert, José
Tomás, Andrés
Tárraga, Joaquín
Medina, Ignacio
Dopazo, Joaquín
Blanquer, Ignacio
author_sort Salavert, José
collection PubMed
description BACKGROUND: Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and Manzini Index or Suffix Arrays. All these backward search algorithms have excellent performance, but their computational cost highly increases when allowing errors. In this paper, we discuss an inexact mapping algorithm based on pruning strategies for search tree exploration over genomic data. RESULTS: The proposed algorithm achieves a 13x speed-up over similar algorithms when allowing 6 base errors, including insertions, deletions and mismatches. This algorithm can deal with 400 bps reads with up to 9 errors in a high quality Illumina dataset. In this example, the algorithm works as a preprocessor that reduces by 55% the number of reads to be aligned. Depending on the aligner the overall execution time is reduced between 20–40%. CONCLUSIONS: Although not intended as a complete sequence mapping tool, the proposed algorithm could be used as a preprocessing step to modern sequence mappers. This step significantly reduces the number reads to be aligned, accelerating overall alignment time. Furthermore, this algorithm could be used for accelerating the seeding step of already available sequence mappers. In addition, an out-of-core index has been implemented for working with large genomes on systems without expensive memory configurations.
format Online
Article
Text
id pubmed-4384339
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43843392015-04-04 Fast inexact mapping using advanced tree exploration on backward search methods Salavert, José Tomás, Andrés Tárraga, Joaquín Medina, Ignacio Dopazo, Joaquín Blanquer, Ignacio BMC Bioinformatics Methodology Article BACKGROUND: Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and Manzini Index or Suffix Arrays. All these backward search algorithms have excellent performance, but their computational cost highly increases when allowing errors. In this paper, we discuss an inexact mapping algorithm based on pruning strategies for search tree exploration over genomic data. RESULTS: The proposed algorithm achieves a 13x speed-up over similar algorithms when allowing 6 base errors, including insertions, deletions and mismatches. This algorithm can deal with 400 bps reads with up to 9 errors in a high quality Illumina dataset. In this example, the algorithm works as a preprocessor that reduces by 55% the number of reads to be aligned. Depending on the aligner the overall execution time is reduced between 20–40%. CONCLUSIONS: Although not intended as a complete sequence mapping tool, the proposed algorithm could be used as a preprocessing step to modern sequence mappers. This step significantly reduces the number reads to be aligned, accelerating overall alignment time. Furthermore, this algorithm could be used for accelerating the seeding step of already available sequence mappers. In addition, an out-of-core index has been implemented for working with large genomes on systems without expensive memory configurations. BioMed Central 2015-01-28 /pmc/articles/PMC4384339/ /pubmed/25626517 http://dx.doi.org/10.1186/s12859-014-0438-3 Text en © Torres et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Salavert, José
Tomás, Andrés
Tárraga, Joaquín
Medina, Ignacio
Dopazo, Joaquín
Blanquer, Ignacio
Fast inexact mapping using advanced tree exploration on backward search methods
title Fast inexact mapping using advanced tree exploration on backward search methods
title_full Fast inexact mapping using advanced tree exploration on backward search methods
title_fullStr Fast inexact mapping using advanced tree exploration on backward search methods
title_full_unstemmed Fast inexact mapping using advanced tree exploration on backward search methods
title_short Fast inexact mapping using advanced tree exploration on backward search methods
title_sort fast inexact mapping using advanced tree exploration on backward search methods
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384339/
https://www.ncbi.nlm.nih.gov/pubmed/25626517
http://dx.doi.org/10.1186/s12859-014-0438-3
work_keys_str_mv AT salavertjose fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods
AT tomasandres fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods
AT tarragajoaquin fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods
AT medinaignacio fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods
AT dopazojoaquin fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods
AT blanquerignacio fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods