Cargando…

TRASH: Tandem Repeat Annotation and Structural Hierarchy

MOTIVATION: The advent of long-read DNA sequencing is allowing complete assembly of highly repetitive genomic regions for the first time, including the megabase-scale satellite repeat arrays found in many eukaryotic centromeres. The assembly of such repetitive regions creates a need for their de nov...

Descripción completa

Detalles Bibliográficos
Autores principales: Wlodzimierz, Piotr, Hong, Michael, Henderson, Ian R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10199239/
https://www.ncbi.nlm.nih.gov/pubmed/37162382
http://dx.doi.org/10.1093/bioinformatics/btad308
_version_ 1785044889157763072
author Wlodzimierz, Piotr
Hong, Michael
Henderson, Ian R
author_facet Wlodzimierz, Piotr
Hong, Michael
Henderson, Ian R
author_sort Wlodzimierz, Piotr
collection PubMed
description MOTIVATION: The advent of long-read DNA sequencing is allowing complete assembly of highly repetitive genomic regions for the first time, including the megabase-scale satellite repeat arrays found in many eukaryotic centromeres. The assembly of such repetitive regions creates a need for their de novo annotation, including patterns of higher order repetition. To annotate tandem repeats, methods are required that can be widely applied to diverse genome sequences, without prior knowledge of monomer sequences. RESULTS: Tandem Repeat Annotation and Structural Hierarchy (TRASH) is a tool that identifies and maps tandem repeats in nucleotide sequence, without prior knowledge of repeat composition. TRASH analyses a fasta assembly file, identifies regions occupied by repeats and then precisely maps them and their higher order structures. To demonstrate the applicability and scalability of TRASH for centromere research, we apply our method to the recently published Col-CEN genome of Arabidopsis thaliana and the complete human CHM13 genome. AVAILABILITY AND IMPLEMENTATION: TRASH is freely available at:https://github.com/vlothec/TRASH and supported on Linux.
format Online
Article
Text
id pubmed-10199239
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101992392023-05-21 TRASH: Tandem Repeat Annotation and Structural Hierarchy Wlodzimierz, Piotr Hong, Michael Henderson, Ian R Bioinformatics Original Paper MOTIVATION: The advent of long-read DNA sequencing is allowing complete assembly of highly repetitive genomic regions for the first time, including the megabase-scale satellite repeat arrays found in many eukaryotic centromeres. The assembly of such repetitive regions creates a need for their de novo annotation, including patterns of higher order repetition. To annotate tandem repeats, methods are required that can be widely applied to diverse genome sequences, without prior knowledge of monomer sequences. RESULTS: Tandem Repeat Annotation and Structural Hierarchy (TRASH) is a tool that identifies and maps tandem repeats in nucleotide sequence, without prior knowledge of repeat composition. TRASH analyses a fasta assembly file, identifies regions occupied by repeats and then precisely maps them and their higher order structures. To demonstrate the applicability and scalability of TRASH for centromere research, we apply our method to the recently published Col-CEN genome of Arabidopsis thaliana and the complete human CHM13 genome. AVAILABILITY AND IMPLEMENTATION: TRASH is freely available at:https://github.com/vlothec/TRASH and supported on Linux. Oxford University Press 2023-05-10 /pmc/articles/PMC10199239/ /pubmed/37162382 http://dx.doi.org/10.1093/bioinformatics/btad308 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Wlodzimierz, Piotr
Hong, Michael
Henderson, Ian R
TRASH: Tandem Repeat Annotation and Structural Hierarchy
title TRASH: Tandem Repeat Annotation and Structural Hierarchy
title_full TRASH: Tandem Repeat Annotation and Structural Hierarchy
title_fullStr TRASH: Tandem Repeat Annotation and Structural Hierarchy
title_full_unstemmed TRASH: Tandem Repeat Annotation and Structural Hierarchy
title_short TRASH: Tandem Repeat Annotation and Structural Hierarchy
title_sort trash: tandem repeat annotation and structural hierarchy
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10199239/
https://www.ncbi.nlm.nih.gov/pubmed/37162382
http://dx.doi.org/10.1093/bioinformatics/btad308
work_keys_str_mv AT wlodzimierzpiotr trashtandemrepeatannotationandstructuralhierarchy
AT hongmichael trashtandemrepeatannotationandstructuralhierarchy
AT hendersonianr trashtandemrepeatannotationandstructuralhierarchy