Cargando…

Comparison of multiple imputation and other methods for the analysis of imputed genotypes

BACKGROUND: Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Auer, Paul L., Wang, Gao, Li, Guangyou, DeWan, Andrew T., Leal, Suzanne M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242917/ https://www.ncbi.nlm.nih.gov/pubmed/37277705 http://dx.doi.org/10.1186/s12864-023-09415-0

_version_	1785054321479516160
author	Auer, Paul L. Wang, Gao Li, Guangyou DeWan, Andrew T. Leal, Suzanne M.
author_facet	Auer, Paul L. Wang, Gao Li, Guangyou DeWan, Andrew T. Leal, Suzanne M.
author_sort	Auer, Paul L.
collection	PubMed
description	BACKGROUND: Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the true genotype is unknown and genotypes are inferred with uncertainty using statistical models. Here, we present a novel method for integrating imputation uncertainty into statistical association tests using a fully conditional multiple imputation (MI) approach which is implemented using the Substantive Model Compatible Fully Conditional Specification (SMCFCS). We compared the performance of this method to an unconditional MI and two additional approaches that have been shown to demonstrate excellent performance: regression with dosages and a mixture of regression models (MRM). RESULTS: Our simulations considered a range of allele frequencies and imputation qualities based on data from the UK Biobank. We found that the unconditional MI was computationally costly and overly conservative across a wide range of settings. Analyzing data with Dosage, MRM, or MI SMCFCS resulted in greater power, including for low frequency variants, compared to unconditional MI while effectively controlling type I error rates. MRM andl MI SMCFCS are both more computationally intensive then using Dosage. CONCLUSIONS: The unconditional MI approach for association testing is overly conservative and we do not recommend its use in the context of imputed genotypes. Given its performance, speed, and ease of implementation, we recommend using Dosage for imputed genotypes with MAF [Formula: see text] 0.001 and Rsq [Formula: see text] 0.3. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09415-0.
format	Online Article Text
id	pubmed-10242917
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-102429172023-06-07 Comparison of multiple imputation and other methods for the analysis of imputed genotypes Auer, Paul L. Wang, Gao Li, Guangyou DeWan, Andrew T. Leal, Suzanne M. BMC Genomics Research BACKGROUND: Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the true genotype is unknown and genotypes are inferred with uncertainty using statistical models. Here, we present a novel method for integrating imputation uncertainty into statistical association tests using a fully conditional multiple imputation (MI) approach which is implemented using the Substantive Model Compatible Fully Conditional Specification (SMCFCS). We compared the performance of this method to an unconditional MI and two additional approaches that have been shown to demonstrate excellent performance: regression with dosages and a mixture of regression models (MRM). RESULTS: Our simulations considered a range of allele frequencies and imputation qualities based on data from the UK Biobank. We found that the unconditional MI was computationally costly and overly conservative across a wide range of settings. Analyzing data with Dosage, MRM, or MI SMCFCS resulted in greater power, including for low frequency variants, compared to unconditional MI while effectively controlling type I error rates. MRM andl MI SMCFCS are both more computationally intensive then using Dosage. CONCLUSIONS: The unconditional MI approach for association testing is overly conservative and we do not recommend its use in the context of imputed genotypes. Given its performance, speed, and ease of implementation, we recommend using Dosage for imputed genotypes with MAF [Formula: see text] 0.001 and Rsq [Formula: see text] 0.3. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09415-0. BioMed Central 2023-06-06 /pmc/articles/PMC10242917/ /pubmed/37277705 http://dx.doi.org/10.1186/s12864-023-09415-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Auer, Paul L. Wang, Gao Li, Guangyou DeWan, Andrew T. Leal, Suzanne M. Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title	Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title_full	Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title_fullStr	Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title_full_unstemmed	Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title_short	Comparison of multiple imputation and other methods for the analysis of imputed genotypes
title_sort	comparison of multiple imputation and other methods for the analysis of imputed genotypes
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242917/ https://www.ncbi.nlm.nih.gov/pubmed/37277705 http://dx.doi.org/10.1186/s12864-023-09415-0
work_keys_str_mv	AT auerpaull comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes AT wanggao comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes AT liguangyou comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes AT dewanandrewt comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes AT lealsuzannem comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes

Comparison of multiple imputation and other methods for the analysis of imputed genotypes

Ejemplares similares