Cargando…

Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

BACKGROUND: Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods e...

Descripción completa

Detalles Bibliográficos
Autores principales: Ou, Shujun, Su, Weija, Liao, Yi, Chougule, Kapeel, Agda, Jireh R. A., Hellinga, Adam J., Lugo, Carlos Santiago Blanco, Elliott, Tyler A., Ware, Doreen, Peterson, Thomas, Jiang, Ning, Hirsch, Candice N., Hufford, Matthew B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6913007/
https://www.ncbi.nlm.nih.gov/pubmed/31843001
http://dx.doi.org/10.1186/s13059-019-1905-y
_version_ 1783479587758407680
author Ou, Shujun
Su, Weija
Liao, Yi
Chougule, Kapeel
Agda, Jireh R. A.
Hellinga, Adam J.
Lugo, Carlos Santiago Blanco
Elliott, Tyler A.
Ware, Doreen
Peterson, Thomas
Jiang, Ning
Hirsch, Candice N.
Hufford, Matthew B.
author_facet Ou, Shujun
Su, Weija
Liao, Yi
Chougule, Kapeel
Agda, Jireh R. A.
Hellinga, Adam J.
Lugo, Carlos Santiago Blanco
Elliott, Tyler A.
Ware, Doreen
Peterson, Thomas
Jiang, Ning
Hirsch, Candice N.
Hufford, Matthew B.
author_sort Ou, Shujun
collection PubMed
description BACKGROUND: Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. RESULTS: We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F(1). Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. CONCLUSIONS: The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.
format Online
Article
Text
id pubmed-6913007
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69130072019-12-30 Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline Ou, Shujun Su, Weija Liao, Yi Chougule, Kapeel Agda, Jireh R. A. Hellinga, Adam J. Lugo, Carlos Santiago Blanco Elliott, Tyler A. Ware, Doreen Peterson, Thomas Jiang, Ning Hirsch, Candice N. Hufford, Matthew B. Genome Biol Research BACKGROUND: Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. RESULTS: We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F(1). Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. CONCLUSIONS: The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA. BioMed Central 2019-12-16 /pmc/articles/PMC6913007/ /pubmed/31843001 http://dx.doi.org/10.1186/s13059-019-1905-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ou, Shujun
Su, Weija
Liao, Yi
Chougule, Kapeel
Agda, Jireh R. A.
Hellinga, Adam J.
Lugo, Carlos Santiago Blanco
Elliott, Tyler A.
Ware, Doreen
Peterson, Thomas
Jiang, Ning
Hirsch, Candice N.
Hufford, Matthew B.
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
title Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
title_full Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
title_fullStr Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
title_full_unstemmed Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
title_short Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
title_sort benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6913007/
https://www.ncbi.nlm.nih.gov/pubmed/31843001
http://dx.doi.org/10.1186/s13059-019-1905-y
work_keys_str_mv AT oushujun benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT suweija benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT liaoyi benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT chougulekapeel benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT agdajirehra benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT hellingaadamj benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT lugocarlossantiagoblanco benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT elliotttylera benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT waredoreen benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT petersonthomas benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT jiangning benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT hirschcandicen benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT huffordmatthewb benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline