Cargando…

RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing

BACKGROUND: Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Jinfeng, Wrightsman, Travis R., Wessler, Susan R., Stajich, Jason E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5274521/
https://www.ncbi.nlm.nih.gov/pubmed/28149701
http://dx.doi.org/10.7717/peerj.2942
_version_ 1782501932391202816
author Chen, Jinfeng
Wrightsman, Travis R.
Wessler, Susan R.
Stajich, Jason E.
author_facet Chen, Jinfeng
Wrightsman, Travis R.
Wessler, Susan R.
Stajich, Jason E.
author_sort Chen, Jinfeng
collection PubMed
description BACKGROUND: Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools. METHODS: We have developed the tool RelocaTE2 for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision. RESULTS AND DISCUSSION: The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.
format Online
Article
Text
id pubmed-5274521
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-52745212017-02-01 RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing Chen, Jinfeng Wrightsman, Travis R. Wessler, Susan R. Stajich, Jason E. PeerJ Bioinformatics BACKGROUND: Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools. METHODS: We have developed the tool RelocaTE2 for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision. RESULTS AND DISCUSSION: The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing. PeerJ Inc. 2017-01-26 /pmc/articles/PMC5274521/ /pubmed/28149701 http://dx.doi.org/10.7717/peerj.2942 Text en ©2017 Chen et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Chen, Jinfeng
Wrightsman, Travis R.
Wessler, Susan R.
Stajich, Jason E.
RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing
title RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing
title_full RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing
title_fullStr RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing
title_full_unstemmed RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing
title_short RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing
title_sort relocate2: a high resolution transposable element insertion site mapping tool for population resequencing
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5274521/
https://www.ncbi.nlm.nih.gov/pubmed/28149701
http://dx.doi.org/10.7717/peerj.2942
work_keys_str_mv AT chenjinfeng relocate2ahighresolutiontransposableelementinsertionsitemappingtoolforpopulationresequencing
AT wrightsmantravisr relocate2ahighresolutiontransposableelementinsertionsitemappingtoolforpopulationresequencing
AT wesslersusanr relocate2ahighresolutiontransposableelementinsertionsitemappingtoolforpopulationresequencing
AT stajichjasone relocate2ahighresolutiontransposableelementinsertionsitemappingtoolforpopulationresequencing