Cargando…

SVIM: structural variant identification using mapped long reads

MOTIVATION: Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been...

Descripción completa

Detalles Bibliográficos
Autores principales: Heller, David, Vingron, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6735718/
https://www.ncbi.nlm.nih.gov/pubmed/30668829
http://dx.doi.org/10.1093/bioinformatics/btz041
_version_ 1783450399950241792
author Heller, David
Vingron, Martin
author_facet Heller, David
Vingron, Martin
author_sort Heller, David
collection PubMed
description MOTIVATION: Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. RESULTS: We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. AVAILABILITY AND IMPLEMENTATION: The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6735718
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-67357182019-09-16 SVIM: structural variant identification using mapped long reads Heller, David Vingron, Martin Bioinformatics Original Papers MOTIVATION: Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. RESULTS: We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. AVAILABILITY AND IMPLEMENTATION: The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-09-01 2019-01-21 /pmc/articles/PMC6735718/ /pubmed/30668829 http://dx.doi.org/10.1093/bioinformatics/btz041 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Heller, David
Vingron, Martin
SVIM: structural variant identification using mapped long reads
title SVIM: structural variant identification using mapped long reads
title_full SVIM: structural variant identification using mapped long reads
title_fullStr SVIM: structural variant identification using mapped long reads
title_full_unstemmed SVIM: structural variant identification using mapped long reads
title_short SVIM: structural variant identification using mapped long reads
title_sort svim: structural variant identification using mapped long reads
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6735718/
https://www.ncbi.nlm.nih.gov/pubmed/30668829
http://dx.doi.org/10.1093/bioinformatics/btz041
work_keys_str_mv AT hellerdavid svimstructuralvariantidentificationusingmappedlongreads
AT vingronmartin svimstructuralvariantidentificationusingmappedlongreads