Cargando…
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
BACKGROUND: Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods e...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6913007/ https://www.ncbi.nlm.nih.gov/pubmed/31843001 http://dx.doi.org/10.1186/s13059-019-1905-y |
_version_ | 1783479587758407680 |
---|---|
author | Ou, Shujun Su, Weija Liao, Yi Chougule, Kapeel Agda, Jireh R. A. Hellinga, Adam J. Lugo, Carlos Santiago Blanco Elliott, Tyler A. Ware, Doreen Peterson, Thomas Jiang, Ning Hirsch, Candice N. Hufford, Matthew B. |
author_facet | Ou, Shujun Su, Weija Liao, Yi Chougule, Kapeel Agda, Jireh R. A. Hellinga, Adam J. Lugo, Carlos Santiago Blanco Elliott, Tyler A. Ware, Doreen Peterson, Thomas Jiang, Ning Hirsch, Candice N. Hufford, Matthew B. |
author_sort | Ou, Shujun |
collection | PubMed |
description | BACKGROUND: Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. RESULTS: We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F(1). Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. CONCLUSIONS: The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA. |
format | Online Article Text |
id | pubmed-6913007 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69130072019-12-30 Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline Ou, Shujun Su, Weija Liao, Yi Chougule, Kapeel Agda, Jireh R. A. Hellinga, Adam J. Lugo, Carlos Santiago Blanco Elliott, Tyler A. Ware, Doreen Peterson, Thomas Jiang, Ning Hirsch, Candice N. Hufford, Matthew B. Genome Biol Research BACKGROUND: Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. RESULTS: We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F(1). Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. CONCLUSIONS: The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA. BioMed Central 2019-12-16 /pmc/articles/PMC6913007/ /pubmed/31843001 http://dx.doi.org/10.1186/s13059-019-1905-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Ou, Shujun Su, Weija Liao, Yi Chougule, Kapeel Agda, Jireh R. A. Hellinga, Adam J. Lugo, Carlos Santiago Blanco Elliott, Tyler A. Ware, Doreen Peterson, Thomas Jiang, Ning Hirsch, Candice N. Hufford, Matthew B. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
title | Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
title_full | Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
title_fullStr | Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
title_full_unstemmed | Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
title_short | Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
title_sort | benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6913007/ https://www.ncbi.nlm.nih.gov/pubmed/31843001 http://dx.doi.org/10.1186/s13059-019-1905-y |
work_keys_str_mv | AT oushujun benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT suweija benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT liaoyi benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT chougulekapeel benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT agdajirehra benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT hellingaadamj benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT lugocarlossantiagoblanco benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT elliotttylera benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT waredoreen benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT petersonthomas benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT jiangning benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT hirschcandicen benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT huffordmatthewb benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline |