Cargando…
An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data
Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638139/ https://www.ncbi.nlm.nih.gov/pubmed/23296920 http://dx.doi.org/10.1101/gr.146084.112 |
_version_ | 1782475800894767104 |
---|---|
author | Wang, Yi Lu, James Yu, Jin Gibbs, Richard A. Yu, Fuli |
author_facet | Wang, Yi Lu, James Yu, Jin Gibbs, Richard A. Yu, Fuli |
author_sort | Wang, Yi |
collection | PubMed |
description | Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains several innovations that specifically address challenges caused by low-coverage population sequencing: (1) effective base depth (EBD), a nonparametric statistic that enables more accurate statistical modeling of sequencing data; (2) variance ratio scoring, a variance-based statistic that discovers polymorphic loci with high sensitivity and specificity; and (3) BAM-specific binomial mixture modeling (BBMM), a clustering algorithm that generates robust genotype likelihoods from heterogeneous sequencing data. Last, we develop an imputation engine that refines raw genotype likelihoods to produce high-quality phased genotypes/haplotypes. Designed for large population studies, SNPTools' input/output (I/O) and storage aware design leads to improved computing performance on large sequencing data sets. We apply SNPTools to the International 1000 Genomes Project (1000G) Phase 1 low-coverage data set and obtain genotyping accuracy comparable to that of SNP microarray. |
format | Online Article Text |
id | pubmed-3638139 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-36381392013-11-01 An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data Wang, Yi Lu, James Yu, Jin Gibbs, Richard A. Yu, Fuli Genome Res Method Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains several innovations that specifically address challenges caused by low-coverage population sequencing: (1) effective base depth (EBD), a nonparametric statistic that enables more accurate statistical modeling of sequencing data; (2) variance ratio scoring, a variance-based statistic that discovers polymorphic loci with high sensitivity and specificity; and (3) BAM-specific binomial mixture modeling (BBMM), a clustering algorithm that generates robust genotype likelihoods from heterogeneous sequencing data. Last, we develop an imputation engine that refines raw genotype likelihoods to produce high-quality phased genotypes/haplotypes. Designed for large population studies, SNPTools' input/output (I/O) and storage aware design leads to improved computing performance on large sequencing data sets. We apply SNPTools to the International 1000 Genomes Project (1000G) Phase 1 low-coverage data set and obtain genotyping accuracy comparable to that of SNP microarray. Cold Spring Harbor Laboratory Press 2013-05 /pmc/articles/PMC3638139/ /pubmed/23296920 http://dx.doi.org/10.1101/gr.146084.112 Text en © 2013, Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/3.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/. |
spellingShingle | Method Wang, Yi Lu, James Yu, Jin Gibbs, Richard A. Yu, Fuli An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data |
title | An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data |
title_full | An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data |
title_fullStr | An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data |
title_full_unstemmed | An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data |
title_short | An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data |
title_sort | integrative variant analysis pipeline for accurate genotype/haplotype inference in population ngs data |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638139/ https://www.ncbi.nlm.nih.gov/pubmed/23296920 http://dx.doi.org/10.1101/gr.146084.112 |
work_keys_str_mv | AT wangyi anintegrativevariantanalysispipelineforaccurategenotypehaplotypeinferenceinpopulationngsdata AT lujames anintegrativevariantanalysispipelineforaccurategenotypehaplotypeinferenceinpopulationngsdata AT yujin anintegrativevariantanalysispipelineforaccurategenotypehaplotypeinferenceinpopulationngsdata AT gibbsricharda anintegrativevariantanalysispipelineforaccurategenotypehaplotypeinferenceinpopulationngsdata AT yufuli anintegrativevariantanalysispipelineforaccurategenotypehaplotypeinferenceinpopulationngsdata AT wangyi integrativevariantanalysispipelineforaccurategenotypehaplotypeinferenceinpopulationngsdata AT lujames integrativevariantanalysispipelineforaccurategenotypehaplotypeinferenceinpopulationngsdata AT yujin integrativevariantanalysispipelineforaccurategenotypehaplotypeinferenceinpopulationngsdata AT gibbsricharda integrativevariantanalysispipelineforaccurategenotypehaplotypeinferenceinpopulationngsdata AT yufuli integrativevariantanalysispipelineforaccurategenotypehaplotypeinferenceinpopulationngsdata |