Cargando…

De novo reconstruction of satellite repeat units from sequence data

Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work f...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yujie, Chu, Justin, Cheng, Haoyu, Li, Heng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cornell University 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153287/
https://www.ncbi.nlm.nih.gov/pubmed/37131874
_version_ 1785035901241393152
author Zhang, Yujie
Chu, Justin
Cheng, Haoyu
Li, Heng
author_facet Zhang, Yujie
Chu, Justin
Cheng, Haoyu
Li, Heng
author_sort Zhang, Yujie
collection PubMed
description Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we showed that SRF could reconstruct known satellites in human and well-studied model organisms. We also found satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress on genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.
format Online
Article
Text
id pubmed-10153287
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cornell University
record_format MEDLINE/PubMed
spelling pubmed-101532872023-05-03 De novo reconstruction of satellite repeat units from sequence data Zhang, Yujie Chu, Justin Cheng, Haoyu Li, Heng ArXiv Article Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we showed that SRF could reconstruct known satellites in human and well-studied model organisms. We also found satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress on genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled. Cornell University 2023-04-19 /pmc/articles/PMC10153287/ /pubmed/37131874 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Zhang, Yujie
Chu, Justin
Cheng, Haoyu
Li, Heng
De novo reconstruction of satellite repeat units from sequence data
title De novo reconstruction of satellite repeat units from sequence data
title_full De novo reconstruction of satellite repeat units from sequence data
title_fullStr De novo reconstruction of satellite repeat units from sequence data
title_full_unstemmed De novo reconstruction of satellite repeat units from sequence data
title_short De novo reconstruction of satellite repeat units from sequence data
title_sort de novo reconstruction of satellite repeat units from sequence data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153287/
https://www.ncbi.nlm.nih.gov/pubmed/37131874
work_keys_str_mv AT zhangyujie denovoreconstructionofsatelliterepeatunitsfromsequencedata
AT chujustin denovoreconstructionofsatelliterepeatunitsfromsequencedata
AT chenghaoyu denovoreconstructionofsatelliterepeatunitsfromsequencedata
AT liheng denovoreconstructionofsatelliterepeatunitsfromsequencedata