Cargando…

Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data

Despite the widespread use of genotype imputation tools and the availability of different approaches, late developments of currently used programs have not been compared comprehensively. We therefore assessed the performance of 35 combinations of phasing and imputation programs, including versions o...

Descripción completa

Detalles Bibliográficos
Autores principales: Stahl, Katharina, Gola, Damian, König, Inke R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493217/
https://www.ncbi.nlm.nih.gov/pubmed/34630519
http://dx.doi.org/10.3389/fgene.2021.724037
_version_ 1784579076361551872
author Stahl, Katharina
Gola, Damian
König, Inke R.
author_facet Stahl, Katharina
Gola, Damian
König, Inke R.
author_sort Stahl, Katharina
collection PubMed
description Despite the widespread use of genotype imputation tools and the availability of different approaches, late developments of currently used programs have not been compared comprehensively. We therefore assessed the performance of 35 combinations of phasing and imputation programs, including versions of SHAPEIT, Eagle, Beagle, minimac, PBWT, and IMPUTE, for genetic imputation of completely missing SNPs with a HRC reference panel regarding quality and speed. We used a data set comprising 1,149 fully sequenced individuals from the German population, subsetting the SNPs to approximate the Illumina Infinium-Omni5 array. Five hundred fifty-three thousand two hundred and thirty-four SNPs across two selected chromosomes were utilized for comparison between imputed and sequenced genotypes. We found that all tested programs with the exception of PBWT impute genotypes with very high accuracy (mean error rate < 0.005). PBTW hardly ever imputes the less frequent allele correctly (mean concordance for genotypes including the minor allele <0.0002). For all programs, imputation accuracy drops for rare alleles with a frequency <0.05. Even though overall concordance is high, concordance drops with genotype probability, indicating that low genotype probabilities are rare. The mean concordance of SNPs with a genotype probability <95% drops below 0.9, at which point disregarding imputed genotypes might prove favorable. For fast and accurate imputation, a combination of Eagle2.4.1 using a reference panel for phasing and Beagle5.1 for imputation performs best. Replacing Beagle5.1 with minimac3, minimac4, Beagle4.1, or IMPUTE4 results in a small gain in accuracy at a high cost of speed.
format Online
Article
Text
id pubmed-8493217
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-84932172021-10-07 Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data Stahl, Katharina Gola, Damian König, Inke R. Front Genet Genetics Despite the widespread use of genotype imputation tools and the availability of different approaches, late developments of currently used programs have not been compared comprehensively. We therefore assessed the performance of 35 combinations of phasing and imputation programs, including versions of SHAPEIT, Eagle, Beagle, minimac, PBWT, and IMPUTE, for genetic imputation of completely missing SNPs with a HRC reference panel regarding quality and speed. We used a data set comprising 1,149 fully sequenced individuals from the German population, subsetting the SNPs to approximate the Illumina Infinium-Omni5 array. Five hundred fifty-three thousand two hundred and thirty-four SNPs across two selected chromosomes were utilized for comparison between imputed and sequenced genotypes. We found that all tested programs with the exception of PBWT impute genotypes with very high accuracy (mean error rate < 0.005). PBTW hardly ever imputes the less frequent allele correctly (mean concordance for genotypes including the minor allele <0.0002). For all programs, imputation accuracy drops for rare alleles with a frequency <0.05. Even though overall concordance is high, concordance drops with genotype probability, indicating that low genotype probabilities are rare. The mean concordance of SNPs with a genotype probability <95% drops below 0.9, at which point disregarding imputed genotypes might prove favorable. For fast and accurate imputation, a combination of Eagle2.4.1 using a reference panel for phasing and Beagle5.1 for imputation performs best. Replacing Beagle5.1 with minimac3, minimac4, Beagle4.1, or IMPUTE4 results in a small gain in accuracy at a high cost of speed. Frontiers Media S.A. 2021-09-22 /pmc/articles/PMC8493217/ /pubmed/34630519 http://dx.doi.org/10.3389/fgene.2021.724037 Text en Copyright © 2021 Stahl, Gola and König. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Stahl, Katharina
Gola, Damian
König, Inke R.
Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data
title Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data
title_full Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data
title_fullStr Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data
title_full_unstemmed Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data
title_short Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data
title_sort assessment of imputation quality: comparison of phasing and imputation algorithms in real data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493217/
https://www.ncbi.nlm.nih.gov/pubmed/34630519
http://dx.doi.org/10.3389/fgene.2021.724037
work_keys_str_mv AT stahlkatharina assessmentofimputationqualitycomparisonofphasingandimputationalgorithmsinrealdata
AT goladamian assessmentofimputationqualitycomparisonofphasingandimputationalgorithmsinrealdata
AT koniginker assessmentofimputationqualitycomparisonofphasingandimputationalgorithmsinrealdata