Cargando…
Comparison of multiple imputation and other methods for the analysis of imputed genotypes
BACKGROUND: Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242917/ https://www.ncbi.nlm.nih.gov/pubmed/37277705 http://dx.doi.org/10.1186/s12864-023-09415-0 |
_version_ | 1785054321479516160 |
---|---|
author | Auer, Paul L. Wang, Gao Li, Guangyou DeWan, Andrew T. Leal, Suzanne M. |
author_facet | Auer, Paul L. Wang, Gao Li, Guangyou DeWan, Andrew T. Leal, Suzanne M. |
author_sort | Auer, Paul L. |
collection | PubMed |
description | BACKGROUND: Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the true genotype is unknown and genotypes are inferred with uncertainty using statistical models. Here, we present a novel method for integrating imputation uncertainty into statistical association tests using a fully conditional multiple imputation (MI) approach which is implemented using the Substantive Model Compatible Fully Conditional Specification (SMCFCS). We compared the performance of this method to an unconditional MI and two additional approaches that have been shown to demonstrate excellent performance: regression with dosages and a mixture of regression models (MRM). RESULTS: Our simulations considered a range of allele frequencies and imputation qualities based on data from the UK Biobank. We found that the unconditional MI was computationally costly and overly conservative across a wide range of settings. Analyzing data with Dosage, MRM, or MI SMCFCS resulted in greater power, including for low frequency variants, compared to unconditional MI while effectively controlling type I error rates. MRM andl MI SMCFCS are both more computationally intensive then using Dosage. CONCLUSIONS: The unconditional MI approach for association testing is overly conservative and we do not recommend its use in the context of imputed genotypes. Given its performance, speed, and ease of implementation, we recommend using Dosage for imputed genotypes with MAF [Formula: see text] 0.001 and Rsq [Formula: see text] 0.3. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09415-0. |
format | Online Article Text |
id | pubmed-10242917 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-102429172023-06-07 Comparison of multiple imputation and other methods for the analysis of imputed genotypes Auer, Paul L. Wang, Gao Li, Guangyou DeWan, Andrew T. Leal, Suzanne M. BMC Genomics Research BACKGROUND: Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the true genotype is unknown and genotypes are inferred with uncertainty using statistical models. Here, we present a novel method for integrating imputation uncertainty into statistical association tests using a fully conditional multiple imputation (MI) approach which is implemented using the Substantive Model Compatible Fully Conditional Specification (SMCFCS). We compared the performance of this method to an unconditional MI and two additional approaches that have been shown to demonstrate excellent performance: regression with dosages and a mixture of regression models (MRM). RESULTS: Our simulations considered a range of allele frequencies and imputation qualities based on data from the UK Biobank. We found that the unconditional MI was computationally costly and overly conservative across a wide range of settings. Analyzing data with Dosage, MRM, or MI SMCFCS resulted in greater power, including for low frequency variants, compared to unconditional MI while effectively controlling type I error rates. MRM andl MI SMCFCS are both more computationally intensive then using Dosage. CONCLUSIONS: The unconditional MI approach for association testing is overly conservative and we do not recommend its use in the context of imputed genotypes. Given its performance, speed, and ease of implementation, we recommend using Dosage for imputed genotypes with MAF [Formula: see text] 0.001 and Rsq [Formula: see text] 0.3. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09415-0. BioMed Central 2023-06-06 /pmc/articles/PMC10242917/ /pubmed/37277705 http://dx.doi.org/10.1186/s12864-023-09415-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Auer, Paul L. Wang, Gao Li, Guangyou DeWan, Andrew T. Leal, Suzanne M. Comparison of multiple imputation and other methods for the analysis of imputed genotypes |
title | Comparison of multiple imputation and other methods for the analysis of imputed genotypes |
title_full | Comparison of multiple imputation and other methods for the analysis of imputed genotypes |
title_fullStr | Comparison of multiple imputation and other methods for the analysis of imputed genotypes |
title_full_unstemmed | Comparison of multiple imputation and other methods for the analysis of imputed genotypes |
title_short | Comparison of multiple imputation and other methods for the analysis of imputed genotypes |
title_sort | comparison of multiple imputation and other methods for the analysis of imputed genotypes |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10242917/ https://www.ncbi.nlm.nih.gov/pubmed/37277705 http://dx.doi.org/10.1186/s12864-023-09415-0 |
work_keys_str_mv | AT auerpaull comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes AT wanggao comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes AT liguangyou comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes AT dewanandrewt comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes AT lealsuzannem comparisonofmultipleimputationandothermethodsfortheanalysisofimputedgenotypes |