Cargando…

Fast detection of de novo copy number variants from SNP arrays for case-parent trios

BACKGROUND: In studies of case-parent trios, we define copy number variants (CNVs) in the offspring that differ from the parental copy numbers as de novo and of interest for their potential functional role in disease. Among the leading array-based methods for discovery of de novo CNVs in case-parent...

Descripción completa

Detalles Bibliográficos
Autores principales: Scharpf, Robert B, Beaty, Terri H, Schwender, Holger, Younkin, Samuel G, Scott, Alan F, Ruczinski, Ingo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3576329/
https://www.ncbi.nlm.nih.gov/pubmed/23234608
http://dx.doi.org/10.1186/1471-2105-13-330
_version_ 1782259840606797824
author Scharpf, Robert B
Beaty, Terri H
Schwender, Holger
Younkin, Samuel G
Scott, Alan F
Ruczinski, Ingo
author_facet Scharpf, Robert B
Beaty, Terri H
Schwender, Holger
Younkin, Samuel G
Scott, Alan F
Ruczinski, Ingo
author_sort Scharpf, Robert B
collection PubMed
description BACKGROUND: In studies of case-parent trios, we define copy number variants (CNVs) in the offspring that differ from the parental copy numbers as de novo and of interest for their potential functional role in disease. Among the leading array-based methods for discovery of de novo CNVs in case-parent trios is the joint hidden Markov model (HMM) implemented in the PennCNV software. However, the computational demands of the joint HMM are substantial and the extent to which false positive identifications occur in case-parent trios has not been well described. We evaluate these issues in a study of oral cleft case-parent trios. RESULTS: Our analysis of the oral cleft trios reveals that genomic waves represent a substantial source of false positive identifications in the joint HMM, despite a wave-correction implementation in PennCNV. In addition, the noise of low-level summaries of relative copy number (log R ratios) is strongly associated with batch and correlated with the frequency of de novo CNV calls. Exploiting the trio design, we propose a univariate statistic for relative copy number referred to as the minimum distance that can reduce technical variation from probe effects and genomic waves. We use circular binary segmentation to segment the minimum distance and maximum a posteriori estimation to infer de novo CNVs from the segmented genome. Compared to PennCNV on simulated data, MinimumDistance identifies fewer false positives on average and is comparable to PennCNV with respect to false negatives. Genomic waves contribute to discordance of PennCNV and MinimumDistance for high coverage de novo calls, while highly concordant calls on chromosome 22 were validated by quantitative PCR. Computationally, MinimumDistance provides a nearly 8-fold increase in speed relative to the joint HMM in a study of oral cleft trios. CONCLUSIONS: Our results indicate that batch effects and genomic waves are important considerations for case-parent studies of de novo CNV, and that the minimum distance is an effective statistic for reducing technical variation contributing to false de novo discoveries. Coupled with segmentation and maximum a posteriori estimation, our algorithm compares favorably to the joint HMM with MinimumDistance being much faster.
format Online
Article
Text
id pubmed-3576329
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35763292013-02-22 Fast detection of de novo copy number variants from SNP arrays for case-parent trios Scharpf, Robert B Beaty, Terri H Schwender, Holger Younkin, Samuel G Scott, Alan F Ruczinski, Ingo BMC Bioinformatics Methodology Article BACKGROUND: In studies of case-parent trios, we define copy number variants (CNVs) in the offspring that differ from the parental copy numbers as de novo and of interest for their potential functional role in disease. Among the leading array-based methods for discovery of de novo CNVs in case-parent trios is the joint hidden Markov model (HMM) implemented in the PennCNV software. However, the computational demands of the joint HMM are substantial and the extent to which false positive identifications occur in case-parent trios has not been well described. We evaluate these issues in a study of oral cleft case-parent trios. RESULTS: Our analysis of the oral cleft trios reveals that genomic waves represent a substantial source of false positive identifications in the joint HMM, despite a wave-correction implementation in PennCNV. In addition, the noise of low-level summaries of relative copy number (log R ratios) is strongly associated with batch and correlated with the frequency of de novo CNV calls. Exploiting the trio design, we propose a univariate statistic for relative copy number referred to as the minimum distance that can reduce technical variation from probe effects and genomic waves. We use circular binary segmentation to segment the minimum distance and maximum a posteriori estimation to infer de novo CNVs from the segmented genome. Compared to PennCNV on simulated data, MinimumDistance identifies fewer false positives on average and is comparable to PennCNV with respect to false negatives. Genomic waves contribute to discordance of PennCNV and MinimumDistance for high coverage de novo calls, while highly concordant calls on chromosome 22 were validated by quantitative PCR. Computationally, MinimumDistance provides a nearly 8-fold increase in speed relative to the joint HMM in a study of oral cleft trios. CONCLUSIONS: Our results indicate that batch effects and genomic waves are important considerations for case-parent studies of de novo CNV, and that the minimum distance is an effective statistic for reducing technical variation contributing to false de novo discoveries. Coupled with segmentation and maximum a posteriori estimation, our algorithm compares favorably to the joint HMM with MinimumDistance being much faster. BioMed Central 2012-12-12 /pmc/articles/PMC3576329/ /pubmed/23234608 http://dx.doi.org/10.1186/1471-2105-13-330 Text en Copyright ©2012 Scharpf et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Scharpf, Robert B
Beaty, Terri H
Schwender, Holger
Younkin, Samuel G
Scott, Alan F
Ruczinski, Ingo
Fast detection of de novo copy number variants from SNP arrays for case-parent trios
title Fast detection of de novo copy number variants from SNP arrays for case-parent trios
title_full Fast detection of de novo copy number variants from SNP arrays for case-parent trios
title_fullStr Fast detection of de novo copy number variants from SNP arrays for case-parent trios
title_full_unstemmed Fast detection of de novo copy number variants from SNP arrays for case-parent trios
title_short Fast detection of de novo copy number variants from SNP arrays for case-parent trios
title_sort fast detection of de novo copy number variants from snp arrays for case-parent trios
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3576329/
https://www.ncbi.nlm.nih.gov/pubmed/23234608
http://dx.doi.org/10.1186/1471-2105-13-330
work_keys_str_mv AT scharpfrobertb fastdetectionofdenovocopynumbervariantsfromsnparraysforcaseparenttrios
AT beatyterrih fastdetectionofdenovocopynumbervariantsfromsnparraysforcaseparenttrios
AT schwenderholger fastdetectionofdenovocopynumbervariantsfromsnparraysforcaseparenttrios
AT younkinsamuelg fastdetectionofdenovocopynumbervariantsfromsnparraysforcaseparenttrios
AT scottalanf fastdetectionofdenovocopynumbervariantsfromsnparraysforcaseparenttrios
AT ruczinskiingo fastdetectionofdenovocopynumbervariantsfromsnparraysforcaseparenttrios