Cargando…

Comparison of multiple imputation and other methods for the analysis of imputed genotypes

BACKGROUND: Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the...

Descripción completa

Detalles Bibliográficos
Autores principales: Auer, Paul L., Wang, Gao, Li, Guangyou, DeWan, Andrew T., Leal, Suzanne M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242917/
https://www.ncbi.nlm.nih.gov/pubmed/37277705
http://dx.doi.org/10.1186/s12864-023-09415-0
_version_ 1785054321479516160
author Auer, Paul L.
Wang, Gao
Li, Guangyou
DeWan, Andrew T.
Leal, Suzanne M.
author_facet Auer, Paul L.
Wang, Gao
Li, Guangyou
DeWan, Andrew T.
Leal, Suzanne M.
author_sort Auer, Paul L.
collection PubMed
description BACKGROUND: Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the true genotype is unknown and genotypes are inferred with uncertainty using statistical models. Here, we present a novel method for integrating imputation uncertainty into statistical association tests using a fully conditional multiple imputation (MI) approach which is implemented using the Substantive Model Compatible Fully Conditional Specification (SMCFCS). We compared the performance of this method to an unconditional MI and two additional approaches that have been shown to demonstrate excellent performance: regression with dosages and a mixture of regression models (MRM). RESULTS: Our simulations considered a range of allele frequencies and imputation qualities based on data from the UK Biobank. We found that the unconditional MI was computationally costly and overly conservative across a wide range of settings. Analyzing data with Dosage, MRM, or MI SMCFCS resulted in greater power, including for low frequency variants, compared to unconditional MI while effectively controlling type I error rates. MRM andl MI SMCFCS are both more computationally intensive then using Dosage. CONCLUSIONS: The unconditional MI approach for association testing is overly conservative and we do not recommend its use in the context of imputed genotypes. Given its performance, speed, and ease of implementation, we recommend using Dosage for imputed genotypes with MAF [Formula: see text] 0.001 and Rsq [Formula: see text] 0.3. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09415-0.
format Online
Article
Text
id pubmed-10242917
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-102429172023-06-07 Comparison of multiple imputation and other methods for the analysis of imputed genotypes Auer, Paul L. Wang, Gao Li, Guangyou DeWan, Andrew T. Leal, Suzanne M. BMC Genomics Research BACKGROUND: Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the true genotype is unknown and genotypes are inferred with uncertainty using statistical models. Here, we present a novel method for integrating imputation uncertainty into statistical association tests using a fully conditional multiple imputation (MI) approach which is implemented using the Substantive Model Compatible Fully Conditional Specification (SMCFCS). We compared the performance of this method to an unconditional MI and two additional approaches that have been shown to demonstrate excellent performance: regression with dosages and a mixture of regression models (MRM). RESULTS: Our simulations considered a range of allele frequencies and imputation qualities based on data from the UK Biobank. We found that the unconditional MI was computationally costly and overly conservative across a wide range of settings. Analyzing data with Dosage, MRM, or MI SMCFCS resulted in greater power, including for low frequency variants, compared to unconditional MI while effectively controlling type I error rates. MRM andl MI SMCFCS are both more computationally intensive then using Dosage. CONCLUSIONS: The unconditional MI approach for association testing is overly conservative and we do not recommend its use in the context of imputed genotypes. Given its performance, speed, and ease of implementation, we recommend using Dosage for imputed genotypes with MAF [Formula: see text] 0.001 and Rsq [Formula: see text] 0.3. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09415-0. BioMed Central 2023-06-06 /pmc/articles/PMC10242917/ /pubmed/37277705 http://dx.doi.org/10.1186/s12864-023-09415-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Auer, Paul L.
Wang, Gao
Li, Guangyou
DeWan, Andrew T.
Leal, Suzanne M.
Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title_full Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title_fullStr Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title_full_unstemmed Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title_short Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title_sort comparison of multiple imputation and other methods for the analysis of imputed genotypes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242917/
https://www.ncbi.nlm.nih.gov/pubmed/37277705
http://dx.doi.org/10.1186/s12864-023-09415-0
work_keys_str_mv AT auerpaull comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes
AT wanggao comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes
AT liguangyou comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes
AT dewanandrewt comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes
AT lealsuzannem comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes