Cargando…

Discovery and characterization of Alu repeat sequences via precise local read assembly

Alu insertions have contributed to >11% of the human genome and ∼30–35 Alu subfamilies remain actively mobile, yet the characterization of polymorphic Alu insertions from short-read data remains a challenge. We build on existing computational methods to combine Alu detection and de novo assembly...

Descripción completa

Detalles Bibliográficos
Autores principales: Wildschutte, Julia H., Baron, Alayna, Diroff, Nicolette M., Kidd, Jeffrey M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666360/
https://www.ncbi.nlm.nih.gov/pubmed/26503250
http://dx.doi.org/10.1093/nar/gkv1089
_version_ 1782403693272891392
author Wildschutte, Julia H.
Baron, Alayna
Diroff, Nicolette M.
Kidd, Jeffrey M.
author_facet Wildschutte, Julia H.
Baron, Alayna
Diroff, Nicolette M.
Kidd, Jeffrey M.
author_sort Wildschutte, Julia H.
collection PubMed
description Alu insertions have contributed to >11% of the human genome and ∼30–35 Alu subfamilies remain actively mobile, yet the characterization of polymorphic Alu insertions from short-read data remains a challenge. We build on existing computational methods to combine Alu detection and de novo assembly of WGS data as a means to reconstruct the full sequence of insertion events from Illumina paired end reads. Comparison with published calls obtained using PacBio long-reads indicates a false discovery rate below 5%, at the cost of reduced sensitivity due to the colocation of reference and non-reference repeats. We generate a highly accurate call set of 1614 completely assembled Alu variants from 53 samples from the Human Genome Diversity Project (HGDP) panel. We utilize the reconstructed alternative insertion haplotypes to genotype 1010 fully assembled insertions, obtaining >99% agreement with genotypes obtained by PCR. In our assembled sequences, we find evidence of premature insertion mechanisms and observe 5′ truncation in 16% of AluYa5 and AluYb8 insertions. The sites of truncation coincide with stem-loop structures and SRP9/14 binding sites in the Alu RNA, implicating L1 ORF2p pausing in the generation of 5′ truncations. Additionally, we identified variable AluJ and AluS elements that likely arose due to non-retrotransposition mechanisms.
format Online
Article
Text
id pubmed-4666360
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46663602015-12-02 Discovery and characterization of Alu repeat sequences via precise local read assembly Wildschutte, Julia H. Baron, Alayna Diroff, Nicolette M. Kidd, Jeffrey M. Nucleic Acids Res Genomics Alu insertions have contributed to >11% of the human genome and ∼30–35 Alu subfamilies remain actively mobile, yet the characterization of polymorphic Alu insertions from short-read data remains a challenge. We build on existing computational methods to combine Alu detection and de novo assembly of WGS data as a means to reconstruct the full sequence of insertion events from Illumina paired end reads. Comparison with published calls obtained using PacBio long-reads indicates a false discovery rate below 5%, at the cost of reduced sensitivity due to the colocation of reference and non-reference repeats. We generate a highly accurate call set of 1614 completely assembled Alu variants from 53 samples from the Human Genome Diversity Project (HGDP) panel. We utilize the reconstructed alternative insertion haplotypes to genotype 1010 fully assembled insertions, obtaining >99% agreement with genotypes obtained by PCR. In our assembled sequences, we find evidence of premature insertion mechanisms and observe 5′ truncation in 16% of AluYa5 and AluYb8 insertions. The sites of truncation coincide with stem-loop structures and SRP9/14 binding sites in the Alu RNA, implicating L1 ORF2p pausing in the generation of 5′ truncations. Additionally, we identified variable AluJ and AluS elements that likely arose due to non-retrotransposition mechanisms. Oxford University Press 2015-12-02 2015-10-25 /pmc/articles/PMC4666360/ /pubmed/26503250 http://dx.doi.org/10.1093/nar/gkv1089 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genomics
Wildschutte, Julia H.
Baron, Alayna
Diroff, Nicolette M.
Kidd, Jeffrey M.
Discovery and characterization of Alu repeat sequences via precise local read assembly
title Discovery and characterization of Alu repeat sequences via precise local read assembly
title_full Discovery and characterization of Alu repeat sequences via precise local read assembly
title_fullStr Discovery and characterization of Alu repeat sequences via precise local read assembly
title_full_unstemmed Discovery and characterization of Alu repeat sequences via precise local read assembly
title_short Discovery and characterization of Alu repeat sequences via precise local read assembly
title_sort discovery and characterization of alu repeat sequences via precise local read assembly
topic Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666360/
https://www.ncbi.nlm.nih.gov/pubmed/26503250
http://dx.doi.org/10.1093/nar/gkv1089
work_keys_str_mv AT wildschuttejuliah discoveryandcharacterizationofalurepeatsequencesviapreciselocalreadassembly
AT baronalayna discoveryandcharacterizationofalurepeatsequencesviapreciselocalreadassembly
AT diroffnicolettem discoveryandcharacterizationofalurepeatsequencesviapreciselocalreadassembly
AT kiddjeffreym discoveryandcharacterizationofalurepeatsequencesviapreciselocalreadassembly