Cargando…
Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data
Despite the widespread use of genotype imputation tools and the availability of different approaches, late developments of currently used programs have not been compared comprehensively. We therefore assessed the performance of 35 combinations of phasing and imputation programs, including versions o...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493217/ https://www.ncbi.nlm.nih.gov/pubmed/34630519 http://dx.doi.org/10.3389/fgene.2021.724037 |
_version_ | 1784579076361551872 |
---|---|
author | Stahl, Katharina Gola, Damian König, Inke R. |
author_facet | Stahl, Katharina Gola, Damian König, Inke R. |
author_sort | Stahl, Katharina |
collection | PubMed |
description | Despite the widespread use of genotype imputation tools and the availability of different approaches, late developments of currently used programs have not been compared comprehensively. We therefore assessed the performance of 35 combinations of phasing and imputation programs, including versions of SHAPEIT, Eagle, Beagle, minimac, PBWT, and IMPUTE, for genetic imputation of completely missing SNPs with a HRC reference panel regarding quality and speed. We used a data set comprising 1,149 fully sequenced individuals from the German population, subsetting the SNPs to approximate the Illumina Infinium-Omni5 array. Five hundred fifty-three thousand two hundred and thirty-four SNPs across two selected chromosomes were utilized for comparison between imputed and sequenced genotypes. We found that all tested programs with the exception of PBWT impute genotypes with very high accuracy (mean error rate < 0.005). PBTW hardly ever imputes the less frequent allele correctly (mean concordance for genotypes including the minor allele <0.0002). For all programs, imputation accuracy drops for rare alleles with a frequency <0.05. Even though overall concordance is high, concordance drops with genotype probability, indicating that low genotype probabilities are rare. The mean concordance of SNPs with a genotype probability <95% drops below 0.9, at which point disregarding imputed genotypes might prove favorable. For fast and accurate imputation, a combination of Eagle2.4.1 using a reference panel for phasing and Beagle5.1 for imputation performs best. Replacing Beagle5.1 with minimac3, minimac4, Beagle4.1, or IMPUTE4 results in a small gain in accuracy at a high cost of speed. |
format | Online Article Text |
id | pubmed-8493217 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-84932172021-10-07 Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data Stahl, Katharina Gola, Damian König, Inke R. Front Genet Genetics Despite the widespread use of genotype imputation tools and the availability of different approaches, late developments of currently used programs have not been compared comprehensively. We therefore assessed the performance of 35 combinations of phasing and imputation programs, including versions of SHAPEIT, Eagle, Beagle, minimac, PBWT, and IMPUTE, for genetic imputation of completely missing SNPs with a HRC reference panel regarding quality and speed. We used a data set comprising 1,149 fully sequenced individuals from the German population, subsetting the SNPs to approximate the Illumina Infinium-Omni5 array. Five hundred fifty-three thousand two hundred and thirty-four SNPs across two selected chromosomes were utilized for comparison between imputed and sequenced genotypes. We found that all tested programs with the exception of PBWT impute genotypes with very high accuracy (mean error rate < 0.005). PBTW hardly ever imputes the less frequent allele correctly (mean concordance for genotypes including the minor allele <0.0002). For all programs, imputation accuracy drops for rare alleles with a frequency <0.05. Even though overall concordance is high, concordance drops with genotype probability, indicating that low genotype probabilities are rare. The mean concordance of SNPs with a genotype probability <95% drops below 0.9, at which point disregarding imputed genotypes might prove favorable. For fast and accurate imputation, a combination of Eagle2.4.1 using a reference panel for phasing and Beagle5.1 for imputation performs best. Replacing Beagle5.1 with minimac3, minimac4, Beagle4.1, or IMPUTE4 results in a small gain in accuracy at a high cost of speed. Frontiers Media S.A. 2021-09-22 /pmc/articles/PMC8493217/ /pubmed/34630519 http://dx.doi.org/10.3389/fgene.2021.724037 Text en Copyright © 2021 Stahl, Gola and König. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Stahl, Katharina Gola, Damian König, Inke R. Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data |
title | Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data |
title_full | Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data |
title_fullStr | Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data |
title_full_unstemmed | Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data |
title_short | Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data |
title_sort | assessment of imputation quality: comparison of phasing and imputation algorithms in real data |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493217/ https://www.ncbi.nlm.nih.gov/pubmed/34630519 http://dx.doi.org/10.3389/fgene.2021.724037 |
work_keys_str_mv | AT stahlkatharina assessmentofimputationqualitycomparisonofphasingandimputationalgorithmsinrealdata AT goladamian assessmentofimputationqualitycomparisonofphasingandimputationalgorithmsinrealdata AT koniginker assessmentofimputationqualitycomparisonofphasingandimputationalgorithmsinrealdata |