Cargando…

DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies

Retrotransposons comprise a substantial fraction of eukaryotic genomes, reaching the highest proportions in plants. Therefore, identification and annotation of retrotransposons is an important task in studying the regulation and evolution of plant genomes. The majority of computational tools for min...

Descripción completa

Detalles Bibliográficos
Autores principales: Biryukov, Mikhail, Ustyantsev, Kirill
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8775202/
https://www.ncbi.nlm.nih.gov/pubmed/35052350
http://dx.doi.org/10.3390/genes13010009
_version_ 1784636527502950400
author Biryukov, Mikhail
Ustyantsev, Kirill
author_facet Biryukov, Mikhail
Ustyantsev, Kirill
author_sort Biryukov, Mikhail
collection PubMed
description Retrotransposons comprise a substantial fraction of eukaryotic genomes, reaching the highest proportions in plants. Therefore, identification and annotation of retrotransposons is an important task in studying the regulation and evolution of plant genomes. The majority of computational tools for mining transposable elements (TEs) are designed for subsequent genome repeat masking, often leaving aside the element lineage classification and its protein domain composition. Additionally, studies focused on the diversity and evolution of a particular group of retrotransposons often require substantial customization efforts from researchers to adapt existing software to their needs. Here, we developed a computational pipeline to mine sequences of protein-coding retrotransposons based on the sequences of their conserved protein domains—DARTS (Domain-Associated Retrotransposon Search). Using the most abundant group of TEs in plants—long terminal repeat (LTR) retrotransposons (LTR-RTs)—we show that DARTS has radically higher sensitivity for LTR-RT identification compared to the widely accepted tool LTRharvest. DARTS can be easily customized for specific user needs. As a result, DARTS returns a set of structurally annotated nucleotide and amino acid sequences which can be readily used in subsequent comparative and phylogenetic analyses. DARTS may facilitate researchers interested in the discovery and detailed analysis of the diversity and evolution of retrotransposons, LTR-RTs, and other protein-coding TEs.
format Online
Article
Text
id pubmed-8775202
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87752022022-01-21 DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies Biryukov, Mikhail Ustyantsev, Kirill Genes (Basel) Article Retrotransposons comprise a substantial fraction of eukaryotic genomes, reaching the highest proportions in plants. Therefore, identification and annotation of retrotransposons is an important task in studying the regulation and evolution of plant genomes. The majority of computational tools for mining transposable elements (TEs) are designed for subsequent genome repeat masking, often leaving aside the element lineage classification and its protein domain composition. Additionally, studies focused on the diversity and evolution of a particular group of retrotransposons often require substantial customization efforts from researchers to adapt existing software to their needs. Here, we developed a computational pipeline to mine sequences of protein-coding retrotransposons based on the sequences of their conserved protein domains—DARTS (Domain-Associated Retrotransposon Search). Using the most abundant group of TEs in plants—long terminal repeat (LTR) retrotransposons (LTR-RTs)—we show that DARTS has radically higher sensitivity for LTR-RT identification compared to the widely accepted tool LTRharvest. DARTS can be easily customized for specific user needs. As a result, DARTS returns a set of structurally annotated nucleotide and amino acid sequences which can be readily used in subsequent comparative and phylogenetic analyses. DARTS may facilitate researchers interested in the discovery and detailed analysis of the diversity and evolution of retrotransposons, LTR-RTs, and other protein-coding TEs. MDPI 2021-12-21 /pmc/articles/PMC8775202/ /pubmed/35052350 http://dx.doi.org/10.3390/genes13010009 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Biryukov, Mikhail
Ustyantsev, Kirill
DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
title DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
title_full DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
title_fullStr DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
title_full_unstemmed DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
title_short DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
title_sort darts: an algorithm for domain-associated retrotransposon search in genome assemblies
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8775202/
https://www.ncbi.nlm.nih.gov/pubmed/35052350
http://dx.doi.org/10.3390/genes13010009
work_keys_str_mv AT biryukovmikhail dartsanalgorithmfordomainassociatedretrotransposonsearchingenomeassemblies
AT ustyantsevkirill dartsanalgorithmfordomainassociatedretrotransposonsearchingenomeassemblies