Cargando…

LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo

BACKGROUND: Long terminal repeat retrotransposons are the most abundant transposons in plants. They play important roles in alternative splicing, recombination, gene regulation, and defense mechanisms. Large-scale sequencing projects for plant genomes are currently underway. Software tools are impor...

Descripción completa

Detalles Bibliográficos
Autores principales: Valencia, Joseph D., Girgis, Hani Z.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547461/
https://www.ncbi.nlm.nih.gov/pubmed/31159720
http://dx.doi.org/10.1186/s12864-019-5796-9
_version_ 1783423681275363328
author Valencia, Joseph D.
Girgis, Hani Z.
author_facet Valencia, Joseph D.
Girgis, Hani Z.
author_sort Valencia, Joseph D.
collection PubMed
description BACKGROUND: Long terminal repeat retrotransposons are the most abundant transposons in plants. They play important roles in alternative splicing, recombination, gene regulation, and defense mechanisms. Large-scale sequencing projects for plant genomes are currently underway. Software tools are important for annotating long terminal repeat retrotransposons in these newly available genomes. However, the available tools are not very sensitive to known elements and perform inconsistently on different genomes. Some are hard to install or obsolete. They may struggle to process large plant genomes. None can be executed in parallel out of the box and very few have features to support visual review of new elements. To overcome these limitations, we developed LtrDetector, which uses techniques inspired by signal-processing. RESULTS: We compared LtrDetector to LTR_Finder and LTRharvest, the two most successful predecessor tools, on six plant genomes. For each organism, we constructed a ground truth data set based on queries from a consensus sequence database. According to this evaluation, LtrDetector was the most sensitive tool, achieving 16–23% improvement in sensitivity over LTRharvest and 21% improvement over LTR_Finder. All three tools had low false positive rates, with LtrDetector achieving 98.2% precision, in between its two competitors. Overall, LtrDetector provides the best compromise between high sensitivity and low false positive rate while requiring moderate time and utilizing memory available on personal computers. CONCLUSIONS: LtrDetector uses a novel methodology revolving around k-mer distributions, which allows it to produce high-quality results using relatively lightweight procedures. It is easy to install and use. It is not species specific, performing well using its default parameters on genomes of varying size and repeat content. It is automatically configured for parallel execution and runs efficiently on an ordinary personal computer. It includes a k-mer scores visualization tool to facilitate manual review of the identified elements. These features make LtrDetector an attractive tool for future annotation projects involving long terminal repeat retrotransposons. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5796-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6547461
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65474612019-06-06 LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo Valencia, Joseph D. Girgis, Hani Z. BMC Genomics Software BACKGROUND: Long terminal repeat retrotransposons are the most abundant transposons in plants. They play important roles in alternative splicing, recombination, gene regulation, and defense mechanisms. Large-scale sequencing projects for plant genomes are currently underway. Software tools are important for annotating long terminal repeat retrotransposons in these newly available genomes. However, the available tools are not very sensitive to known elements and perform inconsistently on different genomes. Some are hard to install or obsolete. They may struggle to process large plant genomes. None can be executed in parallel out of the box and very few have features to support visual review of new elements. To overcome these limitations, we developed LtrDetector, which uses techniques inspired by signal-processing. RESULTS: We compared LtrDetector to LTR_Finder and LTRharvest, the two most successful predecessor tools, on six plant genomes. For each organism, we constructed a ground truth data set based on queries from a consensus sequence database. According to this evaluation, LtrDetector was the most sensitive tool, achieving 16–23% improvement in sensitivity over LTRharvest and 21% improvement over LTR_Finder. All three tools had low false positive rates, with LtrDetector achieving 98.2% precision, in between its two competitors. Overall, LtrDetector provides the best compromise between high sensitivity and low false positive rate while requiring moderate time and utilizing memory available on personal computers. CONCLUSIONS: LtrDetector uses a novel methodology revolving around k-mer distributions, which allows it to produce high-quality results using relatively lightweight procedures. It is easy to install and use. It is not species specific, performing well using its default parameters on genomes of varying size and repeat content. It is automatically configured for parallel execution and runs efficiently on an ordinary personal computer. It includes a k-mer scores visualization tool to facilitate manual review of the identified elements. These features make LtrDetector an attractive tool for future annotation projects involving long terminal repeat retrotransposons. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5796-9) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-03 /pmc/articles/PMC6547461/ /pubmed/31159720 http://dx.doi.org/10.1186/s12864-019-5796-9 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Valencia, Joseph D.
Girgis, Hani Z.
LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo
title LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo
title_full LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo
title_fullStr LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo
title_full_unstemmed LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo
title_short LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo
title_sort ltrdetector: a tool-suite for detecting long terminal repeat retrotransposons de-novo
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547461/
https://www.ncbi.nlm.nih.gov/pubmed/31159720
http://dx.doi.org/10.1186/s12864-019-5796-9
work_keys_str_mv AT valenciajosephd ltrdetectoratoolsuitefordetectinglongterminalrepeatretrotransposonsdenovo
AT girgishaniz ltrdetectoratoolsuitefordetectinglongterminalrepeatretrotransposonsdenovo