Cargando…

Accurate spliced alignment of long RNA sequencing reads

MOTIVATION: Long-read RNA sequencing technologies are establishing themselves as the primary techniques to detect novel isoforms, and many such analyses are dependent on read alignments. However, the error rate and sequencing length of the reads create new challenges for accurately aligning them, pa...

Descripción completa

Detalles Bibliográficos
Autores principales: Sahlin, Kristoffer, Mäkinen, Veli
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8665758/
https://www.ncbi.nlm.nih.gov/pubmed/34302453
http://dx.doi.org/10.1093/bioinformatics/btab540
_version_ 1784614075026636800
author Sahlin, Kristoffer
Mäkinen, Veli
author_facet Sahlin, Kristoffer
Mäkinen, Veli
author_sort Sahlin, Kristoffer
collection PubMed
description MOTIVATION: Long-read RNA sequencing technologies are establishing themselves as the primary techniques to detect novel isoforms, and many such analyses are dependent on read alignments. However, the error rate and sequencing length of the reads create new challenges for accurately aligning them, particularly around small exons. RESULTS: We present an alignment method uLTRA for long RNA sequencing reads based on a novel two-pass collinear chaining algorithm. We show that uLTRA produces higher accuracy over state-of-the-art aligners with substantially higher accuracy for small exons on simulated and synthetic data. On simulated data, uLTRA achieves an accuracy of about 60% for exons of length 10 nucleotides or smaller and close to 90% accuracy for exons of length between 11 and 20 nucleotides. On biological data where true read location is unknown, we show several examples where uLTRA aligns to known and novel isoforms containing small exons that are not detected with other aligners. While uLTRA obtains its accuracy using annotations, it can also be used as a wrapper around minimap2 to align reads outside annotated regions. AVAILABILITYAND IMPLEMENTATION: uLTRA is available at https://github.com/ksahlin/ultra. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8665758
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86657582021-12-13 Accurate spliced alignment of long RNA sequencing reads Sahlin, Kristoffer Mäkinen, Veli Bioinformatics Original Papers MOTIVATION: Long-read RNA sequencing technologies are establishing themselves as the primary techniques to detect novel isoforms, and many such analyses are dependent on read alignments. However, the error rate and sequencing length of the reads create new challenges for accurately aligning them, particularly around small exons. RESULTS: We present an alignment method uLTRA for long RNA sequencing reads based on a novel two-pass collinear chaining algorithm. We show that uLTRA produces higher accuracy over state-of-the-art aligners with substantially higher accuracy for small exons on simulated and synthetic data. On simulated data, uLTRA achieves an accuracy of about 60% for exons of length 10 nucleotides or smaller and close to 90% accuracy for exons of length between 11 and 20 nucleotides. On biological data where true read location is unknown, we show several examples where uLTRA aligns to known and novel isoforms containing small exons that are not detected with other aligners. While uLTRA obtains its accuracy using annotations, it can also be used as a wrapper around minimap2 to align reads outside annotated regions. AVAILABILITYAND IMPLEMENTATION: uLTRA is available at https://github.com/ksahlin/ultra. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-07-24 /pmc/articles/PMC8665758/ /pubmed/34302453 http://dx.doi.org/10.1093/bioinformatics/btab540 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Sahlin, Kristoffer
Mäkinen, Veli
Accurate spliced alignment of long RNA sequencing reads
title Accurate spliced alignment of long RNA sequencing reads
title_full Accurate spliced alignment of long RNA sequencing reads
title_fullStr Accurate spliced alignment of long RNA sequencing reads
title_full_unstemmed Accurate spliced alignment of long RNA sequencing reads
title_short Accurate spliced alignment of long RNA sequencing reads
title_sort accurate spliced alignment of long rna sequencing reads
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8665758/
https://www.ncbi.nlm.nih.gov/pubmed/34302453
http://dx.doi.org/10.1093/bioinformatics/btab540
work_keys_str_mv AT sahlinkristoffer accuratesplicedalignmentoflongrnasequencingreads
AT makinenveli accuratesplicedalignmentoflongrnasequencingreads