Cargando…
Bayesian copy number detection and association in large-scale studies
BACKGROUND: Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between sam...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7487704/ https://www.ncbi.nlm.nih.gov/pubmed/32894098 http://dx.doi.org/10.1186/s12885-020-07304-3 |
_version_ | 1783581542047547392 |
---|---|
author | Cristiano, Stephen McKean, David Carey, Jacob Bracci, Paige Brennan, Paul Chou, Michael Du, Mengmeng Gallinger, Steven Goggins, Michael G. Hassan, Manal M. Hung, Rayjean J. Kurtz, Robert C. Li, Donghui Lu, Lingeng Neale, Rachel Olson, Sara Petersen, Gloria Rabe, Kari G. Fu, Jack Risch, Harvey Rosner, Gary L. Ruczinski, Ingo Klein, Alison P. Scharpf, Robert B. |
author_facet | Cristiano, Stephen McKean, David Carey, Jacob Bracci, Paige Brennan, Paul Chou, Michael Du, Mengmeng Gallinger, Steven Goggins, Michael G. Hassan, Manal M. Hung, Rayjean J. Kurtz, Robert C. Li, Donghui Lu, Lingeng Neale, Rachel Olson, Sara Petersen, Gloria Rabe, Kari G. Fu, Jack Risch, Harvey Rosner, Gary L. Ruczinski, Ingo Klein, Alison P. Scharpf, Robert B. |
author_sort | Cristiano, Stephen |
collection | PubMed |
description | BACKGROUND: Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. METHODS: We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. RESULTS: Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). CONCLUSIONS: Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases. |
format | Online Article Text |
id | pubmed-7487704 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-74877042020-09-16 Bayesian copy number detection and association in large-scale studies Cristiano, Stephen McKean, David Carey, Jacob Bracci, Paige Brennan, Paul Chou, Michael Du, Mengmeng Gallinger, Steven Goggins, Michael G. Hassan, Manal M. Hung, Rayjean J. Kurtz, Robert C. Li, Donghui Lu, Lingeng Neale, Rachel Olson, Sara Petersen, Gloria Rabe, Kari G. Fu, Jack Risch, Harvey Rosner, Gary L. Ruczinski, Ingo Klein, Alison P. Scharpf, Robert B. BMC Cancer Research Article BACKGROUND: Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. METHODS: We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. RESULTS: Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). CONCLUSIONS: Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases. BioMed Central 2020-09-07 /pmc/articles/PMC7487704/ /pubmed/32894098 http://dx.doi.org/10.1186/s12885-020-07304-3 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Cristiano, Stephen McKean, David Carey, Jacob Bracci, Paige Brennan, Paul Chou, Michael Du, Mengmeng Gallinger, Steven Goggins, Michael G. Hassan, Manal M. Hung, Rayjean J. Kurtz, Robert C. Li, Donghui Lu, Lingeng Neale, Rachel Olson, Sara Petersen, Gloria Rabe, Kari G. Fu, Jack Risch, Harvey Rosner, Gary L. Ruczinski, Ingo Klein, Alison P. Scharpf, Robert B. Bayesian copy number detection and association in large-scale studies |
title | Bayesian copy number detection and association in large-scale studies |
title_full | Bayesian copy number detection and association in large-scale studies |
title_fullStr | Bayesian copy number detection and association in large-scale studies |
title_full_unstemmed | Bayesian copy number detection and association in large-scale studies |
title_short | Bayesian copy number detection and association in large-scale studies |
title_sort | bayesian copy number detection and association in large-scale studies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7487704/ https://www.ncbi.nlm.nih.gov/pubmed/32894098 http://dx.doi.org/10.1186/s12885-020-07304-3 |
work_keys_str_mv | AT cristianostephen bayesiancopynumberdetectionandassociationinlargescalestudies AT mckeandavid bayesiancopynumberdetectionandassociationinlargescalestudies AT careyjacob bayesiancopynumberdetectionandassociationinlargescalestudies AT braccipaige bayesiancopynumberdetectionandassociationinlargescalestudies AT brennanpaul bayesiancopynumberdetectionandassociationinlargescalestudies AT choumichael bayesiancopynumberdetectionandassociationinlargescalestudies AT dumengmeng bayesiancopynumberdetectionandassociationinlargescalestudies AT gallingersteven bayesiancopynumberdetectionandassociationinlargescalestudies AT gogginsmichaelg bayesiancopynumberdetectionandassociationinlargescalestudies AT hassanmanalm bayesiancopynumberdetectionandassociationinlargescalestudies AT hungrayjeanj bayesiancopynumberdetectionandassociationinlargescalestudies AT kurtzrobertc bayesiancopynumberdetectionandassociationinlargescalestudies AT lidonghui bayesiancopynumberdetectionandassociationinlargescalestudies AT lulingeng bayesiancopynumberdetectionandassociationinlargescalestudies AT nealerachel bayesiancopynumberdetectionandassociationinlargescalestudies AT olsonsara bayesiancopynumberdetectionandassociationinlargescalestudies AT petersengloria bayesiancopynumberdetectionandassociationinlargescalestudies AT rabekarig bayesiancopynumberdetectionandassociationinlargescalestudies AT fujack bayesiancopynumberdetectionandassociationinlargescalestudies AT rischharvey bayesiancopynumberdetectionandassociationinlargescalestudies AT rosnergaryl bayesiancopynumberdetectionandassociationinlargescalestudies AT ruczinskiingo bayesiancopynumberdetectionandassociationinlargescalestudies AT kleinalisonp bayesiancopynumberdetectionandassociationinlargescalestudies AT scharpfrobertb bayesiancopynumberdetectionandassociationinlargescalestudies |