Cargando…

SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data

We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial–binomial model to test the significance of observed allele frequency...

Descripción completa

Detalles Bibliográficos
Autores principales: Wei, Zhi, Wang, Wei, Hu, Pingzhao, Lyon, Gholson J., Hakonarson, Hakon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3201884/
https://www.ncbi.nlm.nih.gov/pubmed/21813454
http://dx.doi.org/10.1093/nar/gkr599
_version_ 1782214932963524608
author Wei, Zhi
Wang, Wei
Hu, Pingzhao
Lyon, Gholson J.
Hakonarson, Hakon
author_facet Wei, Zhi
Wang, Wei
Hu, Pingzhao
Lyon, Gholson J.
Hakonarson, Hakon
author_sort Wei, Zhi
collection PubMed
description We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial–binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to ‘accept or reject the candidates’ provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/.
format Online
Article
Text
id pubmed-3201884
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-32018842011-10-26 SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data Wei, Zhi Wang, Wei Hu, Pingzhao Lyon, Gholson J. Hakonarson, Hakon Nucleic Acids Res Methods Online We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial–binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to ‘accept or reject the candidates’ provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/. Oxford University Press 2011-10 2011-08-03 /pmc/articles/PMC3201884/ /pubmed/21813454 http://dx.doi.org/10.1093/nar/gkr599 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Wei, Zhi
Wang, Wei
Hu, Pingzhao
Lyon, Gholson J.
Hakonarson, Hakon
SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data
title SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data
title_full SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data
title_fullStr SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data
title_full_unstemmed SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data
title_short SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data
title_sort snver: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3201884/
https://www.ncbi.nlm.nih.gov/pubmed/21813454
http://dx.doi.org/10.1093/nar/gkr599
work_keys_str_mv AT weizhi snverastatisticaltoolforvariantcallinginanalysisofpooledorindividualnextgenerationsequencingdata
AT wangwei snverastatisticaltoolforvariantcallinginanalysisofpooledorindividualnextgenerationsequencingdata
AT hupingzhao snverastatisticaltoolforvariantcallinginanalysisofpooledorindividualnextgenerationsequencingdata
AT lyongholsonj snverastatisticaltoolforvariantcallinginanalysisofpooledorindividualnextgenerationsequencingdata
AT hakonarsonhakon snverastatisticaltoolforvariantcallinginanalysisofpooledorindividualnextgenerationsequencingdata