Cargando…
lra: A long read aligner for sequences and contigs
It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the geno...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248648/ https://www.ncbi.nlm.nih.gov/pubmed/34153026 http://dx.doi.org/10.1371/journal.pcbi.1009078 |
_version_ | 1783716766975787008 |
---|---|
author | Ren, Jingwen Chaisson, Mark J. P. |
author_facet | Ren, Jingwen Chaisson, Mark J. P. |
author_sort | Ren, Jingwen |
collection | PubMed |
description | It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA). |
format | Online Article Text |
id | pubmed-8248648 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-82486482021-07-09 lra: A long read aligner for sequences and contigs Ren, Jingwen Chaisson, Mark J. P. PLoS Comput Biol Research Article It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA). Public Library of Science 2021-06-21 /pmc/articles/PMC8248648/ /pubmed/34153026 http://dx.doi.org/10.1371/journal.pcbi.1009078 Text en © 2021 Ren, Chaisson https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Ren, Jingwen Chaisson, Mark J. P. lra: A long read aligner for sequences and contigs |
title | lra: A long read aligner for sequences and contigs |
title_full | lra: A long read aligner for sequences and contigs |
title_fullStr | lra: A long read aligner for sequences and contigs |
title_full_unstemmed | lra: A long read aligner for sequences and contigs |
title_short | lra: A long read aligner for sequences and contigs |
title_sort | lra: a long read aligner for sequences and contigs |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248648/ https://www.ncbi.nlm.nih.gov/pubmed/34153026 http://dx.doi.org/10.1371/journal.pcbi.1009078 |
work_keys_str_mv | AT renjingwen lraalongreadalignerforsequencesandcontigs AT chaissonmarkjp lraalongreadalignerforsequencesandcontigs |