Cargando…

Genome-wide algorithm for detecting CNV associations with diseases

BACKGROUND: SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP data using the marker intensities. However, these algor...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Yaji, Peng, Bo, Fu, Yunxin, Amos, Christopher I
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3173460/
https://www.ncbi.nlm.nih.gov/pubmed/21827692
http://dx.doi.org/10.1186/1471-2105-12-331
_version_ 1782211964073672704
author Xu, Yaji
Peng, Bo
Fu, Yunxin
Amos, Christopher I
author_facet Xu, Yaji
Peng, Bo
Fu, Yunxin
Amos, Christopher I
author_sort Xu, Yaji
collection PubMed
description BACKGROUND: SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP data using the marker intensities. However, these algorithms lack specificity to detect small CNVs owing to the high false positive rate when calling CNVs based on the intensity values. Therefore, the resulting association tests lack power even if the CNVs affecting disease risk are common. An alternative procedure called PennCNV uses information from both the marker intensities as well as the genotypes and therefore has increased sensitivity. RESULTS: By using the hidden Markov model (HMM) implemented in PennCNV to derive the probabilities of different copy number states which we subsequently used in a logistic regression model, we developed a new genome-wide algorithm to detect CNV associations with diseases. We compared this new method with association test applied to the most probable copy number state for each individual that is provided by PennCNV after it performs an initial HMM analysis followed by application of the Viterbi algorithm, which removes information about copy number probabilities. In one of our simulation studies, we showed that for large CNVs (number of SNPs ≥ 10), the association tests based on PennCNV calls gave more significant results, but the new algorithm retained high power. For small CNVs (number of SNPs <10), the logistic algorithm provided smaller average p-values (e.g., p = 7.54e - 17 when relative risk RR = 3.0) in all the scenarios and could capture signals that PennCNV did not (e.g., p = 0.020 when RR = 3.0). From a second set of simulations, we showed that the new algorithm is more powerful in detecting disease associations with small CNVs (number of SNPs ranging from 3 to 5) under different penetrance models (e.g., when RR = 3.0, for relatively weak signals, power = 0.8030 comparing to 0.2879 obtained from the association tests based on PennCNV calls). The new method was implemented in software GWCNV. It is freely available at http://gwcnv.sourceforge.net, distributed under a GPL license. CONCLUSIONS: We conclude that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than the existing HMM algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV.
format Online
Article
Text
id pubmed-3173460
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31734602011-09-15 Genome-wide algorithm for detecting CNV associations with diseases Xu, Yaji Peng, Bo Fu, Yunxin Amos, Christopher I BMC Bioinformatics Methodology Article BACKGROUND: SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP data using the marker intensities. However, these algorithms lack specificity to detect small CNVs owing to the high false positive rate when calling CNVs based on the intensity values. Therefore, the resulting association tests lack power even if the CNVs affecting disease risk are common. An alternative procedure called PennCNV uses information from both the marker intensities as well as the genotypes and therefore has increased sensitivity. RESULTS: By using the hidden Markov model (HMM) implemented in PennCNV to derive the probabilities of different copy number states which we subsequently used in a logistic regression model, we developed a new genome-wide algorithm to detect CNV associations with diseases. We compared this new method with association test applied to the most probable copy number state for each individual that is provided by PennCNV after it performs an initial HMM analysis followed by application of the Viterbi algorithm, which removes information about copy number probabilities. In one of our simulation studies, we showed that for large CNVs (number of SNPs ≥ 10), the association tests based on PennCNV calls gave more significant results, but the new algorithm retained high power. For small CNVs (number of SNPs <10), the logistic algorithm provided smaller average p-values (e.g., p = 7.54e - 17 when relative risk RR = 3.0) in all the scenarios and could capture signals that PennCNV did not (e.g., p = 0.020 when RR = 3.0). From a second set of simulations, we showed that the new algorithm is more powerful in detecting disease associations with small CNVs (number of SNPs ranging from 3 to 5) under different penetrance models (e.g., when RR = 3.0, for relatively weak signals, power = 0.8030 comparing to 0.2879 obtained from the association tests based on PennCNV calls). The new method was implemented in software GWCNV. It is freely available at http://gwcnv.sourceforge.net, distributed under a GPL license. CONCLUSIONS: We conclude that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than the existing HMM algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV. BioMed Central 2011-08-09 /pmc/articles/PMC3173460/ /pubmed/21827692 http://dx.doi.org/10.1186/1471-2105-12-331 Text en Copyright ©2011 Xu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Xu, Yaji
Peng, Bo
Fu, Yunxin
Amos, Christopher I
Genome-wide algorithm for detecting CNV associations with diseases
title Genome-wide algorithm for detecting CNV associations with diseases
title_full Genome-wide algorithm for detecting CNV associations with diseases
title_fullStr Genome-wide algorithm for detecting CNV associations with diseases
title_full_unstemmed Genome-wide algorithm for detecting CNV associations with diseases
title_short Genome-wide algorithm for detecting CNV associations with diseases
title_sort genome-wide algorithm for detecting cnv associations with diseases
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3173460/
https://www.ncbi.nlm.nih.gov/pubmed/21827692
http://dx.doi.org/10.1186/1471-2105-12-331
work_keys_str_mv AT xuyaji genomewidealgorithmfordetectingcnvassociationswithdiseases
AT pengbo genomewidealgorithmfordetectingcnvassociationswithdiseases
AT fuyunxin genomewidealgorithmfordetectingcnvassociationswithdiseases
AT amoschristopheri genomewidealgorithmfordetectingcnvassociationswithdiseases