Cargando…

Copy Number Variation detection from 1000 Genomes project exon capture sequencing data

BACKGROUND: DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detecti...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wu, Jiantao, Grzeda, Krzysztof R, Stewart, Chip, Grubert, Fabian, Urban, Alexander E, Snyder, Michael P, Marth, Gabor T
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3563612/ https://www.ncbi.nlm.nih.gov/pubmed/23157288 http://dx.doi.org/10.1186/1471-2105-13-305

_version_	1782258225088823296
author	Wu, Jiantao Grzeda, Krzysztof R Stewart, Chip Grubert, Fabian Urban, Alexander E Snyder, Michael P Marth, Gabor T
author_facet	Wu, Jiantao Grzeda, Krzysztof R Stewart, Chip Grubert, Fabian Urban, Alexander E Snyder, Michael P Marth, Gabor T
author_sort	Wu, Jiantao
collection	PubMed
description	BACKGROUND: DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function. RESULTS: As members of the 1000 Genomes Project analysis effort, we investigated 697 samples in which 931 genes were targeted and sampled with 454 or Illumina paired-end sequencing. We developed a rigorous Bayesian method to detect CNVs in the genes, based on read depth within target regions. Despite substantial variability in read coverage across samples and targeted exons, we were able to identify 107 heterozygous deletions in the dataset. The experimentally determined false discovery rate (FDR) of the cleanest dataset from the Wellcome Trust Sanger Institute is 12.5%. We were able to substantially improve the FDR in a subset of gene deletion candidates that were adjacent to another gene deletion call (17 calls). The estimated sensitivity of our call-set was 45%. CONCLUSIONS: This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions. Based on the number of events we found and the sensitivity of the methods in the present dataset, we estimate on average 16 genic heterozygous deletions per individual genome. Our power analysis informs ongoing and future projects about sequencing depth and uniformity of read coverage required for efficient detection.
format	Online Article Text
id	pubmed-3563612
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35636122013-02-08 Copy Number Variation detection from 1000 Genomes project exon capture sequencing data Wu, Jiantao Grzeda, Krzysztof R Stewart, Chip Grubert, Fabian Urban, Alexander E Snyder, Michael P Marth, Gabor T BMC Bioinformatics Methodology Article BACKGROUND: DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function. RESULTS: As members of the 1000 Genomes Project analysis effort, we investigated 697 samples in which 931 genes were targeted and sampled with 454 or Illumina paired-end sequencing. We developed a rigorous Bayesian method to detect CNVs in the genes, based on read depth within target regions. Despite substantial variability in read coverage across samples and targeted exons, we were able to identify 107 heterozygous deletions in the dataset. The experimentally determined false discovery rate (FDR) of the cleanest dataset from the Wellcome Trust Sanger Institute is 12.5%. We were able to substantially improve the FDR in a subset of gene deletion candidates that were adjacent to another gene deletion call (17 calls). The estimated sensitivity of our call-set was 45%. CONCLUSIONS: This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions. Based on the number of events we found and the sensitivity of the methods in the present dataset, we estimate on average 16 genic heterozygous deletions per individual genome. Our power analysis informs ongoing and future projects about sequencing depth and uniformity of read coverage required for efficient detection. BioMed Central 2012-11-17 /pmc/articles/PMC3563612/ /pubmed/23157288 http://dx.doi.org/10.1186/1471-2105-13-305 Text en Copyright ©2012 Wu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Wu, Jiantao Grzeda, Krzysztof R Stewart, Chip Grubert, Fabian Urban, Alexander E Snyder, Michael P Marth, Gabor T Copy Number Variation detection from 1000 Genomes project exon capture sequencing data
title	Copy Number Variation detection from 1000 Genomes project exon capture sequencing data
title_full	Copy Number Variation detection from 1000 Genomes project exon capture sequencing data
title_fullStr	Copy Number Variation detection from 1000 Genomes project exon capture sequencing data
title_full_unstemmed	Copy Number Variation detection from 1000 Genomes project exon capture sequencing data
title_short	Copy Number Variation detection from 1000 Genomes project exon capture sequencing data
title_sort	copy number variation detection from 1000 genomes project exon capture sequencing data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3563612/ https://www.ncbi.nlm.nih.gov/pubmed/23157288 http://dx.doi.org/10.1186/1471-2105-13-305
work_keys_str_mv	AT wujiantao copynumbervariationdetectionfrom1000genomesprojectexoncapturesequencingdata AT grzedakrzysztofr copynumbervariationdetectionfrom1000genomesprojectexoncapturesequencingdata AT stewartchip copynumbervariationdetectionfrom1000genomesprojectexoncapturesequencingdata AT grubertfabian copynumbervariationdetectionfrom1000genomesprojectexoncapturesequencingdata AT urbanalexandere copynumbervariationdetectionfrom1000genomesprojectexoncapturesequencingdata AT snydermichaelp copynumbervariationdetectionfrom1000genomesprojectexoncapturesequencingdata AT marthgabort copynumbervariationdetectionfrom1000genomesprojectexoncapturesequencingdata

Copy Number Variation detection from 1000 Genomes project exon capture sequencing data

Ejemplares similares