Cargando…
Fast inexact mapping using advanced tree exploration on backward search methods
BACKGROUND: Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferra...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384339/ https://www.ncbi.nlm.nih.gov/pubmed/25626517 http://dx.doi.org/10.1186/s12859-014-0438-3 |
_version_ | 1782364888443650048 |
---|---|
author | Salavert, José Tomás, Andrés Tárraga, Joaquín Medina, Ignacio Dopazo, Joaquín Blanquer, Ignacio |
author_facet | Salavert, José Tomás, Andrés Tárraga, Joaquín Medina, Ignacio Dopazo, Joaquín Blanquer, Ignacio |
author_sort | Salavert, José |
collection | PubMed |
description | BACKGROUND: Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and Manzini Index or Suffix Arrays. All these backward search algorithms have excellent performance, but their computational cost highly increases when allowing errors. In this paper, we discuss an inexact mapping algorithm based on pruning strategies for search tree exploration over genomic data. RESULTS: The proposed algorithm achieves a 13x speed-up over similar algorithms when allowing 6 base errors, including insertions, deletions and mismatches. This algorithm can deal with 400 bps reads with up to 9 errors in a high quality Illumina dataset. In this example, the algorithm works as a preprocessor that reduces by 55% the number of reads to be aligned. Depending on the aligner the overall execution time is reduced between 20–40%. CONCLUSIONS: Although not intended as a complete sequence mapping tool, the proposed algorithm could be used as a preprocessing step to modern sequence mappers. This step significantly reduces the number reads to be aligned, accelerating overall alignment time. Furthermore, this algorithm could be used for accelerating the seeding step of already available sequence mappers. In addition, an out-of-core index has been implemented for working with large genomes on systems without expensive memory configurations. |
format | Online Article Text |
id | pubmed-4384339 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43843392015-04-04 Fast inexact mapping using advanced tree exploration on backward search methods Salavert, José Tomás, Andrés Tárraga, Joaquín Medina, Ignacio Dopazo, Joaquín Blanquer, Ignacio BMC Bioinformatics Methodology Article BACKGROUND: Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and Manzini Index or Suffix Arrays. All these backward search algorithms have excellent performance, but their computational cost highly increases when allowing errors. In this paper, we discuss an inexact mapping algorithm based on pruning strategies for search tree exploration over genomic data. RESULTS: The proposed algorithm achieves a 13x speed-up over similar algorithms when allowing 6 base errors, including insertions, deletions and mismatches. This algorithm can deal with 400 bps reads with up to 9 errors in a high quality Illumina dataset. In this example, the algorithm works as a preprocessor that reduces by 55% the number of reads to be aligned. Depending on the aligner the overall execution time is reduced between 20–40%. CONCLUSIONS: Although not intended as a complete sequence mapping tool, the proposed algorithm could be used as a preprocessing step to modern sequence mappers. This step significantly reduces the number reads to be aligned, accelerating overall alignment time. Furthermore, this algorithm could be used for accelerating the seeding step of already available sequence mappers. In addition, an out-of-core index has been implemented for working with large genomes on systems without expensive memory configurations. BioMed Central 2015-01-28 /pmc/articles/PMC4384339/ /pubmed/25626517 http://dx.doi.org/10.1186/s12859-014-0438-3 Text en © Torres et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Salavert, José Tomás, Andrés Tárraga, Joaquín Medina, Ignacio Dopazo, Joaquín Blanquer, Ignacio Fast inexact mapping using advanced tree exploration on backward search methods |
title | Fast inexact mapping using advanced tree exploration on backward search methods |
title_full | Fast inexact mapping using advanced tree exploration on backward search methods |
title_fullStr | Fast inexact mapping using advanced tree exploration on backward search methods |
title_full_unstemmed | Fast inexact mapping using advanced tree exploration on backward search methods |
title_short | Fast inexact mapping using advanced tree exploration on backward search methods |
title_sort | fast inexact mapping using advanced tree exploration on backward search methods |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384339/ https://www.ncbi.nlm.nih.gov/pubmed/25626517 http://dx.doi.org/10.1186/s12859-014-0438-3 |
work_keys_str_mv | AT salavertjose fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods AT tomasandres fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods AT tarragajoaquin fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods AT medinaignacio fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods AT dopazojoaquin fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods AT blanquerignacio fastinexactmappingusingadvancedtreeexplorationonbackwardsearchmethods |