Cargando…

Vargas: heuristic-free alignment for assessing linear and graph read aligners

MOTIVATION: Read alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Darby, Charlotte A, Gaddipati, Ravi, Schatz, Michael C, Langmead, Ben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320598/
https://www.ncbi.nlm.nih.gov/pubmed/32321164
http://dx.doi.org/10.1093/bioinformatics/btaa265
_version_ 1783551276191055872
author Darby, Charlotte A
Gaddipati, Ravi
Schatz, Michael C
Langmead, Ben
author_facet Darby, Charlotte A
Gaddipati, Ravi
Schatz, Michael C
Langmead, Ben
author_sort Darby, Charlotte A
collection PubMed
description MOTIVATION: Read alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location may not be the (only) location with the optimal alignment score. RESULTS: Vargas implements a heuristic-free algorithm guaranteed to find the highest-scoring alignment for real sequencing reads to a linear or graph genome. With semiglobal and local alignment modes and affine gap and quality-scaled mismatch penalties, it can implement the scoring functions of commonly used aligners to calculate optimal alignments. While this is computationally intensive, Vargas uses multi-core parallelization and vectorized (SIMD) instructions to make it practical to optimally align large numbers of reads, achieving a maximum speed of 456 billion cell updates per second. We demonstrate how these ‘gold standard’ Vargas alignments can be used to improve heuristic alignment accuracy by optimizing command-line parameters in Bowtie 2, BWA-maximal exact match and vg to align more reads correctly. AVAILABILITY AND IMPLEMENTATION: Source code implemented in C++ and compiled binary releases are available at https://github.com/langmead-lab/vargas under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7320598
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73205982020-07-01 Vargas: heuristic-free alignment for assessing linear and graph read aligners Darby, Charlotte A Gaddipati, Ravi Schatz, Michael C Langmead, Ben Bioinformatics Original Papers MOTIVATION: Read alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location may not be the (only) location with the optimal alignment score. RESULTS: Vargas implements a heuristic-free algorithm guaranteed to find the highest-scoring alignment for real sequencing reads to a linear or graph genome. With semiglobal and local alignment modes and affine gap and quality-scaled mismatch penalties, it can implement the scoring functions of commonly used aligners to calculate optimal alignments. While this is computationally intensive, Vargas uses multi-core parallelization and vectorized (SIMD) instructions to make it practical to optimally align large numbers of reads, achieving a maximum speed of 456 billion cell updates per second. We demonstrate how these ‘gold standard’ Vargas alignments can be used to improve heuristic alignment accuracy by optimizing command-line parameters in Bowtie 2, BWA-maximal exact match and vg to align more reads correctly. AVAILABILITY AND IMPLEMENTATION: Source code implemented in C++ and compiled binary releases are available at https://github.com/langmead-lab/vargas under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-06-15 2020-04-22 /pmc/articles/PMC7320598/ /pubmed/32321164 http://dx.doi.org/10.1093/bioinformatics/btaa265 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Darby, Charlotte A
Gaddipati, Ravi
Schatz, Michael C
Langmead, Ben
Vargas: heuristic-free alignment for assessing linear and graph read aligners
title Vargas: heuristic-free alignment for assessing linear and graph read aligners
title_full Vargas: heuristic-free alignment for assessing linear and graph read aligners
title_fullStr Vargas: heuristic-free alignment for assessing linear and graph read aligners
title_full_unstemmed Vargas: heuristic-free alignment for assessing linear and graph read aligners
title_short Vargas: heuristic-free alignment for assessing linear and graph read aligners
title_sort vargas: heuristic-free alignment for assessing linear and graph read aligners
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320598/
https://www.ncbi.nlm.nih.gov/pubmed/32321164
http://dx.doi.org/10.1093/bioinformatics/btaa265
work_keys_str_mv AT darbycharlottea vargasheuristicfreealignmentforassessinglinearandgraphreadaligners
AT gaddipatiravi vargasheuristicfreealignmentforassessinglinearandgraphreadaligners
AT schatzmichaelc vargasheuristicfreealignmentforassessinglinearandgraphreadaligners
AT langmeadben vargasheuristicfreealignmentforassessinglinearandgraphreadaligners