Cargando…

TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data

Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be c...

Descripción completa

Detalles Bibliográficos
Autores principales: Rajaby, Ramesh, Sung, Wing-Kin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6237741/
https://www.ncbi.nlm.nih.gov/pubmed/30137425
http://dx.doi.org/10.1093/nar/gky685
_version_ 1783371229317562368
author Rajaby, Ramesh
Sung, Wing-Kin
author_facet Rajaby, Ramesh
Sung, Wing-Kin
author_sort Rajaby, Ramesh
collection PubMed
description Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.
format Online
Article
Text
id pubmed-6237741
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-62377412018-11-21 TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data Rajaby, Ramesh Sung, Wing-Kin Nucleic Acids Res Methods Online Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database. Oxford University Press 2018-11-16 2018-08-22 /pmc/articles/PMC6237741/ /pubmed/30137425 http://dx.doi.org/10.1093/nar/gky685 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Rajaby, Ramesh
Sung, Wing-Kin
TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data
title TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data
title_full TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data
title_fullStr TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data
title_full_unstemmed TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data
title_short TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data
title_sort transurveyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6237741/
https://www.ncbi.nlm.nih.gov/pubmed/30137425
http://dx.doi.org/10.1093/nar/gky685
work_keys_str_mv AT rajabyramesh transurveyoranimproveddatabasefreealgorithmforfindingnonreferencetranspositionsinhighthroughputsequencingdata
AT sungwingkin transurveyoranimproveddatabasefreealgorithmforfindingnonreferencetranspositionsinhighthroughputsequencingdata