Cargando…
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies
Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5851578/ https://www.ncbi.nlm.nih.gov/pubmed/29538441 http://dx.doi.org/10.1371/journal.pone.0193588 |
_version_ | 1783306411692785664 |
---|---|
author | Zeng, Lu Kortschak, R. Daniel Raison, Joy M. Bertozzi, Terry Adelson, David L. |
author_facet | Zeng, Lu Kortschak, R. Daniel Raison, Joy M. Bertozzi, Terry Adelson, David L. |
author_sort | Zeng, Lu |
collection | PubMed |
description | Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. |
format | Online Article Text |
id | pubmed-5851578 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-58515782018-03-23 Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies Zeng, Lu Kortschak, R. Daniel Raison, Joy M. Bertozzi, Terry Adelson, David L. PLoS One Research Article Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. Public Library of Science 2018-03-14 /pmc/articles/PMC5851578/ /pubmed/29538441 http://dx.doi.org/10.1371/journal.pone.0193588 Text en © 2018 Zeng et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Zeng, Lu Kortschak, R. Daniel Raison, Joy M. Bertozzi, Terry Adelson, David L. Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies |
title | Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies |
title_full | Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies |
title_fullStr | Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies |
title_full_unstemmed | Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies |
title_short | Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies |
title_sort | superior ab initio identification, annotation and characterisation of tes and segmental duplications from genome assemblies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5851578/ https://www.ncbi.nlm.nih.gov/pubmed/29538441 http://dx.doi.org/10.1371/journal.pone.0193588 |
work_keys_str_mv | AT zenglu superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies AT kortschakrdaniel superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies AT raisonjoym superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies AT bertozziterry superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies AT adelsondavidl superiorabinitioidentificationannotationandcharacterisationoftesandsegmentalduplicationsfromgenomeassemblies |