Cargando…

Decomposing mosaic tandem repeats accurately from long reads

MOTIVATION: Over the past 30 years, extended tandem repeats (TRs) have been correlated with ∼60 diseases with high odds ratios, and most known TRs consist of single repeat units. However, in the last few years, mosaic TRs composed of different units have been found to be associated with several brai...

Descripción completa

Detalles Bibliográficos
Autores principales: Masutani, Bansho, Kawahara, Riki, Morishita, Shinichi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10118999/
https://www.ncbi.nlm.nih.gov/pubmed/37039842
http://dx.doi.org/10.1093/bioinformatics/btad185
_version_ 1785028927423512576
author Masutani, Bansho
Kawahara, Riki
Morishita, Shinichi
author_facet Masutani, Bansho
Kawahara, Riki
Morishita, Shinichi
author_sort Masutani, Bansho
collection PubMed
description MOTIVATION: Over the past 30 years, extended tandem repeats (TRs) have been correlated with ∼60 diseases with high odds ratios, and most known TRs consist of single repeat units. However, in the last few years, mosaic TRs composed of different units have been found to be associated with several brain disorders by long-read sequencing techniques. Mosaic TRs are difficult-to-characterize sequence configurations that are usually confirmed by manual inspection. Widely used tools are not designed to solve the mosaic TR problem and often fail to properly decompose mosaic TRs. RESULTS: We propose an efficient algorithm that can decompose mosaic TRs in the input string with high sensitivity. Using synthetic benchmark data, we demonstrate that our program named uTR outperforms TRF and RepeatMasker in terms of prediction accuracy, this is especially true when mosaic TRs are more complex, and uTR is faster than TRF and RepeatMasker in most cases. AVAILABILITY AND IMPLEMENTATION: The software program uTR that implements the proposed algorithm is available at https://github.com/morisUtokyo/uTR.
format Online
Article
Text
id pubmed-10118999
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101189992023-04-22 Decomposing mosaic tandem repeats accurately from long reads Masutani, Bansho Kawahara, Riki Morishita, Shinichi Bioinformatics Original Paper MOTIVATION: Over the past 30 years, extended tandem repeats (TRs) have been correlated with ∼60 diseases with high odds ratios, and most known TRs consist of single repeat units. However, in the last few years, mosaic TRs composed of different units have been found to be associated with several brain disorders by long-read sequencing techniques. Mosaic TRs are difficult-to-characterize sequence configurations that are usually confirmed by manual inspection. Widely used tools are not designed to solve the mosaic TR problem and often fail to properly decompose mosaic TRs. RESULTS: We propose an efficient algorithm that can decompose mosaic TRs in the input string with high sensitivity. Using synthetic benchmark data, we demonstrate that our program named uTR outperforms TRF and RepeatMasker in terms of prediction accuracy, this is especially true when mosaic TRs are more complex, and uTR is faster than TRF and RepeatMasker in most cases. AVAILABILITY AND IMPLEMENTATION: The software program uTR that implements the proposed algorithm is available at https://github.com/morisUtokyo/uTR. Oxford University Press 2023-04-11 /pmc/articles/PMC10118999/ /pubmed/37039842 http://dx.doi.org/10.1093/bioinformatics/btad185 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Masutani, Bansho
Kawahara, Riki
Morishita, Shinichi
Decomposing mosaic tandem repeats accurately from long reads
title Decomposing mosaic tandem repeats accurately from long reads
title_full Decomposing mosaic tandem repeats accurately from long reads
title_fullStr Decomposing mosaic tandem repeats accurately from long reads
title_full_unstemmed Decomposing mosaic tandem repeats accurately from long reads
title_short Decomposing mosaic tandem repeats accurately from long reads
title_sort decomposing mosaic tandem repeats accurately from long reads
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10118999/
https://www.ncbi.nlm.nih.gov/pubmed/37039842
http://dx.doi.org/10.1093/bioinformatics/btad185
work_keys_str_mv AT masutanibansho decomposingmosaictandemrepeatsaccuratelyfromlongreads
AT kawaharariki decomposingmosaictandemrepeatsaccuratelyfromlongreads
AT morishitashinichi decomposingmosaictandemrepeatsaccuratelyfromlongreads