Cargando…

PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data

Both 454 and Ion Torrent sequencers are capable of producing large amounts of long high-quality sequencing reads. However, as both methods sequence homopolymers in one cycle, they both suffer from homopolymer uncertainty and incorporation asynchronization. In mapping, such sequencing errors could sh...

Descripción completa

Detalles Bibliográficos
Autores principales: Zeng, Feng, Jiang, Rui, Chen, Ting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3711422/
https://www.ncbi.nlm.nih.gov/pubmed/23700313
http://dx.doi.org/10.1093/nar/gkt372
_version_ 1782276945978851328
author Zeng, Feng
Jiang, Rui
Chen, Ting
author_facet Zeng, Feng
Jiang, Rui
Chen, Ting
author_sort Zeng, Feng
collection PubMed
description Both 454 and Ion Torrent sequencers are capable of producing large amounts of long high-quality sequencing reads. However, as both methods sequence homopolymers in one cycle, they both suffer from homopolymer uncertainty and incorporation asynchronization. In mapping, such sequencing errors could shift alignments around homopolymers and thus induce incorrect mismatches, which have become a critical barrier against the accurate detection of single nucleotide polymorphisms (SNPs). In this article, we propose a hidden Markov model (HMM) to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion. We use a hierarchical model to describe the sequencing and base-calling processes, and we estimate parameters of the HMM from resequencing data by an expectation-maximization algorithm. Based on the HMM, we develop a realignment-based SNP-calling program, termed PyroHMMsnp, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach. Simulation experiments show that the performance of PyroHMMsnp is exceptional across various sequencing coverages in terms of sensitivity, specificity and F(1) measure, compared with other tools. Analysis of the human resequencing data shows that PyroHMMsnp predicts 12.9% more SNPs than Samtools while achieving a higher specificity. (http://code.google.com/p/pyrohmmsnp/).
format Online
Article
Text
id pubmed-3711422
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-37114222013-07-15 PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data Zeng, Feng Jiang, Rui Chen, Ting Nucleic Acids Res Methods Online Both 454 and Ion Torrent sequencers are capable of producing large amounts of long high-quality sequencing reads. However, as both methods sequence homopolymers in one cycle, they both suffer from homopolymer uncertainty and incorporation asynchronization. In mapping, such sequencing errors could shift alignments around homopolymers and thus induce incorrect mismatches, which have become a critical barrier against the accurate detection of single nucleotide polymorphisms (SNPs). In this article, we propose a hidden Markov model (HMM) to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion. We use a hierarchical model to describe the sequencing and base-calling processes, and we estimate parameters of the HMM from resequencing data by an expectation-maximization algorithm. Based on the HMM, we develop a realignment-based SNP-calling program, termed PyroHMMsnp, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach. Simulation experiments show that the performance of PyroHMMsnp is exceptional across various sequencing coverages in terms of sensitivity, specificity and F(1) measure, compared with other tools. Analysis of the human resequencing data shows that PyroHMMsnp predicts 12.9% more SNPs than Samtools while achieving a higher specificity. (http://code.google.com/p/pyrohmmsnp/). Oxford University Press 2013-07 2013-05-21 /pmc/articles/PMC3711422/ /pubmed/23700313 http://dx.doi.org/10.1093/nar/gkt372 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Zeng, Feng
Jiang, Rui
Chen, Ting
PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data
title PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data
title_full PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data
title_fullStr PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data
title_full_unstemmed PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data
title_short PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data
title_sort pyrohmmsnp: an snp caller for ion torrent and 454 sequencing data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3711422/
https://www.ncbi.nlm.nih.gov/pubmed/23700313
http://dx.doi.org/10.1093/nar/gkt372
work_keys_str_mv AT zengfeng pyrohmmsnpansnpcallerforiontorrentand454sequencingdata
AT jiangrui pyrohmmsnpansnpcallerforiontorrentand454sequencingdata
AT chenting pyrohmmsnpansnpcallerforiontorrentand454sequencingdata