Cargando…

WarpSTR: determining tandem repeat lengths using raw nanopore signals

MOTIVATION: Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as...

Descripción completa

Detalles Bibliográficos
Autores principales: Sitarčík, Jozef, Vinař, Tomáš, Brejová, Broňa, Krampl, Werner, Budiš, Jaroslav, Radvánszky, Ján, Lucká, Mária
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10307940/
https://www.ncbi.nlm.nih.gov/pubmed/37326967
http://dx.doi.org/10.1093/bioinformatics/btad388
_version_ 1785066138923696128
author Sitarčík, Jozef
Vinař, Tomáš
Brejová, Broňa
Krampl, Werner
Budiš, Jaroslav
Radvánszky, Ján
Lucká, Mária
author_facet Sitarčík, Jozef
Vinař, Tomáš
Brejová, Broňa
Krampl, Werner
Budiš, Jaroslav
Radvánszky, Ján
Lucká, Mária
author_sort Sitarčík, Jozef
collection PubMed
description MOTIVATION: Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long-read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions, and therefore direct analysis from raw nanopore data is required. RESULTS: Here, we present WarpSTR, a novel method for characterizing both simple and complex tandem repeats directly from raw nanopore signals using a finite-state automaton and a search algorithm analogous to dynamic time warping. By applying this approach to determine the lengths of 241 STRs, we demonstrate that our approach decreases the mean absolute error of the STR length estimate compared to basecalling and STRique. AVAILABILITY AND IMPLEMENTATION: WarpSTR is freely available at https://github.com/fmfi-compbio/warpstr
format Online
Article
Text
id pubmed-10307940
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103079402023-06-30 WarpSTR: determining tandem repeat lengths using raw nanopore signals Sitarčík, Jozef Vinař, Tomáš Brejová, Broňa Krampl, Werner Budiš, Jaroslav Radvánszky, Ján Lucká, Mária Bioinformatics Original Paper MOTIVATION: Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long-read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions, and therefore direct analysis from raw nanopore data is required. RESULTS: Here, we present WarpSTR, a novel method for characterizing both simple and complex tandem repeats directly from raw nanopore signals using a finite-state automaton and a search algorithm analogous to dynamic time warping. By applying this approach to determine the lengths of 241 STRs, we demonstrate that our approach decreases the mean absolute error of the STR length estimate compared to basecalling and STRique. AVAILABILITY AND IMPLEMENTATION: WarpSTR is freely available at https://github.com/fmfi-compbio/warpstr Oxford University Press 2023-06-16 /pmc/articles/PMC10307940/ /pubmed/37326967 http://dx.doi.org/10.1093/bioinformatics/btad388 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Sitarčík, Jozef
Vinař, Tomáš
Brejová, Broňa
Krampl, Werner
Budiš, Jaroslav
Radvánszky, Ján
Lucká, Mária
WarpSTR: determining tandem repeat lengths using raw nanopore signals
title WarpSTR: determining tandem repeat lengths using raw nanopore signals
title_full WarpSTR: determining tandem repeat lengths using raw nanopore signals
title_fullStr WarpSTR: determining tandem repeat lengths using raw nanopore signals
title_full_unstemmed WarpSTR: determining tandem repeat lengths using raw nanopore signals
title_short WarpSTR: determining tandem repeat lengths using raw nanopore signals
title_sort warpstr: determining tandem repeat lengths using raw nanopore signals
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10307940/
https://www.ncbi.nlm.nih.gov/pubmed/37326967
http://dx.doi.org/10.1093/bioinformatics/btad388
work_keys_str_mv AT sitarcikjozef warpstrdeterminingtandemrepeatlengthsusingrawnanoporesignals
AT vinartomas warpstrdeterminingtandemrepeatlengthsusingrawnanoporesignals
AT brejovabrona warpstrdeterminingtandemrepeatlengthsusingrawnanoporesignals
AT kramplwerner warpstrdeterminingtandemrepeatlengthsusingrawnanoporesignals
AT budisjaroslav warpstrdeterminingtandemrepeatlengthsusingrawnanoporesignals
AT radvanszkyjan warpstrdeterminingtandemrepeatlengthsusingrawnanoporesignals
AT luckamaria warpstrdeterminingtandemrepeatlengthsusingrawnanoporesignals