Cargando…

Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing...

Descripción completa

Detalles Bibliográficos
Autores principales: Zeng, Lu, Kortschak, R. Daniel, Raison, Joy M., Bertozzi, Terry, Adelson, David L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5851578/
https://www.ncbi.nlm.nih.gov/pubmed/29538441
http://dx.doi.org/10.1371/journal.pone.0193588
_version_ 1783306411692785664
author Zeng, Lu
Kortschak, R. Daniel
Raison, Joy M.
Bertozzi, Terry
Adelson, David L.
author_facet Zeng, Lu
Kortschak, R. Daniel
Raison, Joy M.
Bertozzi, Terry
Adelson, David L.
author_sort Zeng, Lu
collection PubMed
description Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.
format Online
Article
Text
id pubmed-5851578
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-58515782018-03-23 Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies Zeng, Lu Kortschak, R. Daniel Raison, Joy M. Bertozzi, Terry Adelson, David L. PLoS One Research Article Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. Public Library of Science 2018-03-14 /pmc/articles/PMC5851578/ /pubmed/29538441 http://dx.doi.org/10.1371/journal.pone.0193588 Text en © 2018 Zeng et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zeng, Lu
Kortschak, R. Daniel
Raison, Joy M.
Bertozzi, Terry
Adelson, David L.
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title_full Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title_fullStr Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title_full_unstemmed Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title_short Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title_sort superior ab initio identification, annotation and characterisation of tes and segmental duplications from genome assemblies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5851578/
https://www.ncbi.nlm.nih.gov/pubmed/29538441
http://dx.doi.org/10.1371/journal.pone.0193588
work_keys_str_mv AT zenglu superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies
AT kortschakrdaniel superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies
AT raisonjoym superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies
AT bertozziterry superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies
AT adelsondavidl superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies