Cargando…

Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zeng, Lu, Kortschak, R. Daniel, Raison, Joy M., Bertozzi, Terry, Adelson, David L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5851578/ https://www.ncbi.nlm.nih.gov/pubmed/29538441 http://dx.doi.org/10.1371/journal.pone.0193588

_version_	1783306411692785664
author	Zeng, Lu Kortschak, R. Daniel Raison, Joy M. Bertozzi, Terry Adelson, David L.
author_facet	Zeng, Lu Kortschak, R. Daniel Raison, Joy M. Bertozzi, Terry Adelson, David L.
author_sort	Zeng, Lu
collection	PubMed
description	Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.
format	Online Article Text
id	pubmed-5851578
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-58515782018-03-23 Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies Zeng, Lu Kortschak, R. Daniel Raison, Joy M. Bertozzi, Terry Adelson, David L. PLoS One Research Article Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. Public Library of Science 2018-03-14 /pmc/articles/PMC5851578/ /pubmed/29538441 http://dx.doi.org/10.1371/journal.pone.0193588 Text en © 2018 Zeng et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Zeng, Lu Kortschak, R. Daniel Raison, Joy M. Bertozzi, Terry Adelson, David L. Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title	Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title_full	Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title_fullStr	Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title_full_unstemmed	Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title_short	Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
title_sort	superior ab initio identification, annotation and characterisation of tes and segmental duplications from genome assemblies
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5851578/ https://www.ncbi.nlm.nih.gov/pubmed/29538441 http://dx.doi.org/10.1371/journal.pone.0193588
work_keys_str_mv	AT zenglu superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies AT kortschakrdaniel superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies AT raisonjoym superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies AT bertozziterry superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies AT adelsondavidl superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies

Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies

Ejemplares similares