Cargando…

Fast and SNP-tolerant detection of complex variants and splicing in short reads

Motivation: Next-generation sequencing captures sequence differences in reads relative to a reference genome or transcriptome, including splicing events and complex variants involving multiple mismatches and long indels. We present computational methods for fast detection of complex variants and spl...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Thomas D., Nacu, Serban
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2844994/
https://www.ncbi.nlm.nih.gov/pubmed/20147302
http://dx.doi.org/10.1093/bioinformatics/btq057
_version_ 1782179352744558592
author Wu, Thomas D.
Nacu, Serban
author_facet Wu, Thomas D.
Nacu, Serban
author_sort Wu, Thomas D.
collection PubMed
description Motivation: Next-generation sequencing captures sequence differences in reads relative to a reference genome or transcriptome, including splicing events and complex variants involving multiple mismatches and long indels. We present computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. Our methods are implemented in GSNAP (Genomic Short-read Nucleotide Alignment Program), which can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites. Our program also permits SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles, and can align reads from bisulfite-treated DNA for the study of methylation state. Results: In comparison testing, GSNAP has speeds comparable to existing programs, especially in reads of ≥70 nt and is fastest in detecting complex variants with four or more mismatches or insertions of 1–9 nt and deletions of 1–30 nt. Although SNP tolerance does not increase alignment yield substantially, it affects alignment results in 7–8% of transcriptional reads, typically by revealing alternate genomic mappings for a read. Simulations of bisulfite-converted DNA show a decrease in identifying genomic positions uniquely in 6% of 36 nt reads and 3% of 70 nt reads. Availability: Source code in C and utility programs in Perl are freely available for download as part of the GMAP package at http://share.gene.com/gmap. Contact: twu@gene.com
format Text
id pubmed-2844994
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28449942010-03-29 Fast and SNP-tolerant detection of complex variants and splicing in short reads Wu, Thomas D. Nacu, Serban Bioinformatics Original Papers Motivation: Next-generation sequencing captures sequence differences in reads relative to a reference genome or transcriptome, including splicing events and complex variants involving multiple mismatches and long indels. We present computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. Our methods are implemented in GSNAP (Genomic Short-read Nucleotide Alignment Program), which can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites. Our program also permits SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles, and can align reads from bisulfite-treated DNA for the study of methylation state. Results: In comparison testing, GSNAP has speeds comparable to existing programs, especially in reads of ≥70 nt and is fastest in detecting complex variants with four or more mismatches or insertions of 1–9 nt and deletions of 1–30 nt. Although SNP tolerance does not increase alignment yield substantially, it affects alignment results in 7–8% of transcriptional reads, typically by revealing alternate genomic mappings for a read. Simulations of bisulfite-converted DNA show a decrease in identifying genomic positions uniquely in 6% of 36 nt reads and 3% of 70 nt reads. Availability: Source code in C and utility programs in Perl are freely available for download as part of the GMAP package at http://share.gene.com/gmap. Contact: twu@gene.com Oxford University Press 2010-04-01 2010-02-10 /pmc/articles/PMC2844994/ /pubmed/20147302 http://dx.doi.org/10.1093/bioinformatics/btq057 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Wu, Thomas D.
Nacu, Serban
Fast and SNP-tolerant detection of complex variants and splicing in short reads
title Fast and SNP-tolerant detection of complex variants and splicing in short reads
title_full Fast and SNP-tolerant detection of complex variants and splicing in short reads
title_fullStr Fast and SNP-tolerant detection of complex variants and splicing in short reads
title_full_unstemmed Fast and SNP-tolerant detection of complex variants and splicing in short reads
title_short Fast and SNP-tolerant detection of complex variants and splicing in short reads
title_sort fast and snp-tolerant detection of complex variants and splicing in short reads
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2844994/
https://www.ncbi.nlm.nih.gov/pubmed/20147302
http://dx.doi.org/10.1093/bioinformatics/btq057
work_keys_str_mv AT wuthomasd fastandsnptolerantdetectionofcomplexvariantsandsplicinginshortreads
AT nacuserban fastandsnptolerantdetectionofcomplexvariantsandsplicinginshortreads