Cargando…

A Pipeline NanoTRF as a New Tool for De Novo Satellite DNA Identification in the Raw Nanopore Sequencing Reads of Plant Genomes

High-copy tandemly organized repeats (TRs), or satellite DNA, is an important but still enigmatic component of eukaryotic genomes. TRs comprise arrays of multi-copy and highly similar tandem repeats, which makes the elucidation of TRs a very challenging task. Oxford Nanopore sequencing data provide...

Descripción completa

Detalles Bibliográficos
Autores principales: Kirov, Ilya, Kolganova, Elizaveta, Dudnikov, Maxim, Yurkevich, Olga Yu., Amosova, Alexandra V., Muravenko, Olga V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9413040/
https://www.ncbi.nlm.nih.gov/pubmed/36015406
http://dx.doi.org/10.3390/plants11162103
_version_ 1784775638834479104
author Kirov, Ilya
Kolganova, Elizaveta
Dudnikov, Maxim
Yurkevich, Olga Yu.
Amosova, Alexandra V.
Muravenko, Olga V.
author_facet Kirov, Ilya
Kolganova, Elizaveta
Dudnikov, Maxim
Yurkevich, Olga Yu.
Amosova, Alexandra V.
Muravenko, Olga V.
author_sort Kirov, Ilya
collection PubMed
description High-copy tandemly organized repeats (TRs), or satellite DNA, is an important but still enigmatic component of eukaryotic genomes. TRs comprise arrays of multi-copy and highly similar tandem repeats, which makes the elucidation of TRs a very challenging task. Oxford Nanopore sequencing data provide a valuable source of information on TR organization at the single molecule level. However, bioinformatics tools for de novo identification of TRs in raw Nanopore data have not been reported so far. We developed NanoTRF, a new python pipeline for TR repeat identification, characterization and consensus monomer sequence assembly. This new pipeline requires only a raw Nanopore read file from low-depth (<1×) genome sequencing. The program generates an informative html report and figures on TR genome abundance, monomer sequence and monomer length. In addition, NanoTRF performs annotation of transposable elements (TEs) sequences within or near satDNA arrays, and the information can be used to elucidate how TR–TE co-evolve in the genome. Moreover, we validated by FISH that the NanoTRF report is useful for the evaluation of TR chromosome organization—clustered or dispersed. Our findings showed that NanoTRF is a robust method for the de novo identification of satellite repeats in raw Nanopore data without prior read assembly. The obtained sequences can be used in many downstream analyses including genome assembly assistance and gap estimation, chromosome mapping and cytogenetic marker development.
format Online
Article
Text
id pubmed-9413040
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94130402022-08-27 A Pipeline NanoTRF as a New Tool for De Novo Satellite DNA Identification in the Raw Nanopore Sequencing Reads of Plant Genomes Kirov, Ilya Kolganova, Elizaveta Dudnikov, Maxim Yurkevich, Olga Yu. Amosova, Alexandra V. Muravenko, Olga V. Plants (Basel) Article High-copy tandemly organized repeats (TRs), or satellite DNA, is an important but still enigmatic component of eukaryotic genomes. TRs comprise arrays of multi-copy and highly similar tandem repeats, which makes the elucidation of TRs a very challenging task. Oxford Nanopore sequencing data provide a valuable source of information on TR organization at the single molecule level. However, bioinformatics tools for de novo identification of TRs in raw Nanopore data have not been reported so far. We developed NanoTRF, a new python pipeline for TR repeat identification, characterization and consensus monomer sequence assembly. This new pipeline requires only a raw Nanopore read file from low-depth (<1×) genome sequencing. The program generates an informative html report and figures on TR genome abundance, monomer sequence and monomer length. In addition, NanoTRF performs annotation of transposable elements (TEs) sequences within or near satDNA arrays, and the information can be used to elucidate how TR–TE co-evolve in the genome. Moreover, we validated by FISH that the NanoTRF report is useful for the evaluation of TR chromosome organization—clustered or dispersed. Our findings showed that NanoTRF is a robust method for the de novo identification of satellite repeats in raw Nanopore data without prior read assembly. The obtained sequences can be used in many downstream analyses including genome assembly assistance and gap estimation, chromosome mapping and cytogenetic marker development. MDPI 2022-08-12 /pmc/articles/PMC9413040/ /pubmed/36015406 http://dx.doi.org/10.3390/plants11162103 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kirov, Ilya
Kolganova, Elizaveta
Dudnikov, Maxim
Yurkevich, Olga Yu.
Amosova, Alexandra V.
Muravenko, Olga V.
A Pipeline NanoTRF as a New Tool for De Novo Satellite DNA Identification in the Raw Nanopore Sequencing Reads of Plant Genomes
title A Pipeline NanoTRF as a New Tool for De Novo Satellite DNA Identification in the Raw Nanopore Sequencing Reads of Plant Genomes
title_full A Pipeline NanoTRF as a New Tool for De Novo Satellite DNA Identification in the Raw Nanopore Sequencing Reads of Plant Genomes
title_fullStr A Pipeline NanoTRF as a New Tool for De Novo Satellite DNA Identification in the Raw Nanopore Sequencing Reads of Plant Genomes
title_full_unstemmed A Pipeline NanoTRF as a New Tool for De Novo Satellite DNA Identification in the Raw Nanopore Sequencing Reads of Plant Genomes
title_short A Pipeline NanoTRF as a New Tool for De Novo Satellite DNA Identification in the Raw Nanopore Sequencing Reads of Plant Genomes
title_sort pipeline nanotrf as a new tool for de novo satellite dna identification in the raw nanopore sequencing reads of plant genomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9413040/
https://www.ncbi.nlm.nih.gov/pubmed/36015406
http://dx.doi.org/10.3390/plants11162103
work_keys_str_mv AT kirovilya apipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes
AT kolganovaelizaveta apipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes
AT dudnikovmaxim apipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes
AT yurkevicholgayu apipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes
AT amosovaalexandrav apipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes
AT muravenkoolgav apipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes
AT kirovilya pipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes
AT kolganovaelizaveta pipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes
AT dudnikovmaxim pipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes
AT yurkevicholgayu pipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes
AT amosovaalexandrav pipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes
AT muravenkoolgav pipelinenanotrfasanewtoolfordenovosatellitednaidentificationintherawnanoporesequencingreadsofplantgenomes