Cargando…

lra: A long read aligner for sequences and contigs

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the geno...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Jingwen, Chaisson, Mark J. P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248648/
https://www.ncbi.nlm.nih.gov/pubmed/34153026
http://dx.doi.org/10.1371/journal.pcbi.1009078
_version_ 1783716766975787008
author Ren, Jingwen
Chaisson, Mark J. P.
author_facet Ren, Jingwen
Chaisson, Mark J. P.
author_sort Ren, Jingwen
collection PubMed
description It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).
format Online
Article
Text
id pubmed-8248648
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-82486482021-07-09 lra: A long read aligner for sequences and contigs Ren, Jingwen Chaisson, Mark J. P. PLoS Comput Biol Research Article It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA). Public Library of Science 2021-06-21 /pmc/articles/PMC8248648/ /pubmed/34153026 http://dx.doi.org/10.1371/journal.pcbi.1009078 Text en © 2021 Ren, Chaisson https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Ren, Jingwen
Chaisson, Mark J. P.
lra: A long read aligner for sequences and contigs
title lra: A long read aligner for sequences and contigs
title_full lra: A long read aligner for sequences and contigs
title_fullStr lra: A long read aligner for sequences and contigs
title_full_unstemmed lra: A long read aligner for sequences and contigs
title_short lra: A long read aligner for sequences and contigs
title_sort lra: a long read aligner for sequences and contigs
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248648/
https://www.ncbi.nlm.nih.gov/pubmed/34153026
http://dx.doi.org/10.1371/journal.pcbi.1009078
work_keys_str_mv AT renjingwen lraalongreadalignerforsequencesandcontigs
AT chaissonmarkjp lraalongreadalignerforsequencesandcontigs