Cargando…

A statistical method for the detection of variants from next-generation resequencing of DNA pools

Motivation: Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic re...

Descripción completa

Detalles Bibliográficos
Autor principal: Bansal, Vikas
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881398/
https://www.ncbi.nlm.nih.gov/pubmed/20529923
http://dx.doi.org/10.1093/bioinformatics/btq214
_version_ 1782182115163504640
author Bansal, Vikas
author_facet Bansal, Vikas
author_sort Bansal, Vikas
collection PubMed
description Motivation: Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. Results: We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80–85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3–5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Availability: Implementation of this method is available at http://polymorphism.scripps.edu/∼vbansal/software/CRISP/ Contact: vbansal@scripps.edu
format Text
id pubmed-2881398
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28813982010-06-08 A statistical method for the detection of variants from next-generation resequencing of DNA pools Bansal, Vikas Bioinformatics Ismb 2010 Conference Proceedings July 11 to July 13, 2010, Boston, Ma, Usa Motivation: Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. Results: We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80–85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3–5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Availability: Implementation of this method is available at http://polymorphism.scripps.edu/∼vbansal/software/CRISP/ Contact: vbansal@scripps.edu Oxford University Press 2010-06-15 2010-06-01 /pmc/articles/PMC2881398/ /pubmed/20529923 http://dx.doi.org/10.1093/bioinformatics/btq214 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb 2010 Conference Proceedings July 11 to July 13, 2010, Boston, Ma, Usa
Bansal, Vikas
A statistical method for the detection of variants from next-generation resequencing of DNA pools
title A statistical method for the detection of variants from next-generation resequencing of DNA pools
title_full A statistical method for the detection of variants from next-generation resequencing of DNA pools
title_fullStr A statistical method for the detection of variants from next-generation resequencing of DNA pools
title_full_unstemmed A statistical method for the detection of variants from next-generation resequencing of DNA pools
title_short A statistical method for the detection of variants from next-generation resequencing of DNA pools
title_sort statistical method for the detection of variants from next-generation resequencing of dna pools
topic Ismb 2010 Conference Proceedings July 11 to July 13, 2010, Boston, Ma, Usa
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881398/
https://www.ncbi.nlm.nih.gov/pubmed/20529923
http://dx.doi.org/10.1093/bioinformatics/btq214
work_keys_str_mv AT bansalvikas astatisticalmethodforthedetectionofvariantsfromnextgenerationresequencingofdnapools
AT bansalvikas statisticalmethodforthedetectionofvariantsfromnextgenerationresequencingofdnapools