Cargando…

TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data

Alu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alus are powerful markers to understand population structure, and they represent variants that can greatly impact genome...

Descripción completa

Detalles Bibliográficos
Autores principales: Goubert, Clément, Thomas, Jainy, Payer, Lindsay M, Kidd, Jeffrey M, Feusier, Julie, Watkins, W Scott, Burns, Kathleen H, Jorde, Lynn B, Feschotte, Cédric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7102983/
https://www.ncbi.nlm.nih.gov/pubmed/32067044
http://dx.doi.org/10.1093/nar/gkaa074
_version_ 1783511953655726080
author Goubert, Clément
Thomas, Jainy
Payer, Lindsay M
Kidd, Jeffrey M
Feusier, Julie
Watkins, W Scott
Burns, Kathleen H
Jorde, Lynn B
Feschotte, Cédric
author_facet Goubert, Clément
Thomas, Jainy
Payer, Lindsay M
Kidd, Jeffrey M
Feusier, Julie
Watkins, W Scott
Burns, Kathleen H
Jorde, Lynn B
Feschotte, Cédric
author_sort Goubert, Clément
collection PubMed
description Alu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alus are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alus and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline – TypeTE – which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic Alus, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a high-quality set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics.
format Online
Article
Text
id pubmed-7102983
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-71029832020-04-02 TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data Goubert, Clément Thomas, Jainy Payer, Lindsay M Kidd, Jeffrey M Feusier, Julie Watkins, W Scott Burns, Kathleen H Jorde, Lynn B Feschotte, Cédric Nucleic Acids Res Methods Online Alu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alus are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alus and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline – TypeTE – which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic Alus, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a high-quality set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics. Oxford University Press 2020-04-06 2020-02-18 /pmc/articles/PMC7102983/ /pubmed/32067044 http://dx.doi.org/10.1093/nar/gkaa074 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Goubert, Clément
Thomas, Jainy
Payer, Lindsay M
Kidd, Jeffrey M
Feusier, Julie
Watkins, W Scott
Burns, Kathleen H
Jorde, Lynn B
Feschotte, Cédric
TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data
title TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data
title_full TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data
title_fullStr TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data
title_full_unstemmed TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data
title_short TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data
title_sort typete: a tool to genotype mobile element insertions from whole genome resequencing data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7102983/
https://www.ncbi.nlm.nih.gov/pubmed/32067044
http://dx.doi.org/10.1093/nar/gkaa074
work_keys_str_mv AT goubertclement typeteatooltogenotypemobileelementinsertionsfromwholegenomeresequencingdata
AT thomasjainy typeteatooltogenotypemobileelementinsertionsfromwholegenomeresequencingdata
AT payerlindsaym typeteatooltogenotypemobileelementinsertionsfromwholegenomeresequencingdata
AT kiddjeffreym typeteatooltogenotypemobileelementinsertionsfromwholegenomeresequencingdata
AT feusierjulie typeteatooltogenotypemobileelementinsertionsfromwholegenomeresequencingdata
AT watkinswscott typeteatooltogenotypemobileelementinsertionsfromwholegenomeresequencingdata
AT burnskathleenh typeteatooltogenotypemobileelementinsertionsfromwholegenomeresequencingdata
AT jordelynnb typeteatooltogenotypemobileelementinsertionsfromwholegenomeresequencingdata
AT feschottecedric typeteatooltogenotypemobileelementinsertionsfromwholegenomeresequencingdata