Cargando…

Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes

Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Yu, Chen, Sai, McGuire, Daniel, Chen, Fang, Liu, Mengzhen, Iacono, William G., Hewitt, John K., Hokanson, John E., Krauter, Kenneth, Laakso, Markku, Li, Kevin W., Lutz, Sharon M., McGue, Matthew, Pandit, Anita, Zajac, Gregory J. M., Boehnke, Michael, Abecasis, Goncalo R., Vrieze, Scott I., Zhan, Xiaowei, Jiang, Bibo, Liu, Dajiang J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6063450/
https://www.ncbi.nlm.nih.gov/pubmed/30016313
http://dx.doi.org/10.1371/journal.pgen.1007452
_version_ 1783342559896010752
author Jiang, Yu
Chen, Sai
McGuire, Daniel
Chen, Fang
Liu, Mengzhen
Iacono, William G.
Hewitt, John K.
Hokanson, John E.
Krauter, Kenneth
Laakso, Markku
Li, Kevin W.
Lutz, Sharon M.
McGue, Matthew
Pandit, Anita
Zajac, Gregory J. M.
Boehnke, Michael
Abecasis, Goncalo R.
Vrieze, Scott I.
Zhan, Xiaowei
Jiang, Bibo
Liu, Dajiang J.
author_facet Jiang, Yu
Chen, Sai
McGuire, Daniel
Chen, Fang
Liu, Mengzhen
Iacono, William G.
Hewitt, John K.
Hokanson, John E.
Krauter, Kenneth
Laakso, Markku
Li, Kevin W.
Lutz, Sharon M.
McGue, Matthew
Pandit, Anita
Zajac, Gregory J. M.
Boehnke, Michael
Abecasis, Goncalo R.
Vrieze, Scott I.
Zhan, Xiaowei
Jiang, Bibo
Liu, Dajiang J.
author_sort Jiang, Yu
collection PubMed
description Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.
format Online
Article
Text
id pubmed-6063450
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-60634502018-08-09 Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes Jiang, Yu Chen, Sai McGuire, Daniel Chen, Fang Liu, Mengzhen Iacono, William G. Hewitt, John K. Hokanson, John E. Krauter, Kenneth Laakso, Markku Li, Kevin W. Lutz, Sharon M. McGue, Matthew Pandit, Anita Zajac, Gregory J. M. Boehnke, Michael Abecasis, Goncalo R. Vrieze, Scott I. Zhan, Xiaowei Jiang, Bibo Liu, Dajiang J. PLoS Genet Research Article Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants. Public Library of Science 2018-07-17 /pmc/articles/PMC6063450/ /pubmed/30016313 http://dx.doi.org/10.1371/journal.pgen.1007452 Text en © 2018 Jiang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Jiang, Yu
Chen, Sai
McGuire, Daniel
Chen, Fang
Liu, Mengzhen
Iacono, William G.
Hewitt, John K.
Hokanson, John E.
Krauter, Kenneth
Laakso, Markku
Li, Kevin W.
Lutz, Sharon M.
McGue, Matthew
Pandit, Anita
Zajac, Gregory J. M.
Boehnke, Michael
Abecasis, Goncalo R.
Vrieze, Scott I.
Zhan, Xiaowei
Jiang, Bibo
Liu, Dajiang J.
Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes
title Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes
title_full Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes
title_fullStr Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes
title_full_unstemmed Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes
title_short Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes
title_sort proper conditional analysis in the presence of missing data: application to large scale meta-analysis of tobacco use phenotypes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6063450/
https://www.ncbi.nlm.nih.gov/pubmed/30016313
http://dx.doi.org/10.1371/journal.pgen.1007452
work_keys_str_mv AT jiangyu properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT chensai properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT mcguiredaniel properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT chenfang properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT liumengzhen properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT iaconowilliamg properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT hewittjohnk properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT hokansonjohne properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT krauterkenneth properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT laaksomarkku properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT likevinw properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT lutzsharonm properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT mcguematthew properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT panditanita properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT zajacgregoryjm properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT boehnkemichael properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT abecasisgoncalor properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT vriezescotti properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT zhanxiaowei properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT jiangbibo properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes
AT liudajiangj properconditionalanalysisinthepresenceofmissingdataapplicationtolargescalemetaanalysisoftobaccousephenotypes