Cargando…

Impact of pre-imputation SNP-filtering on genotype imputation results

BACKGROUND: Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different...

Descripción completa

Detalles Bibliográficos
Autores principales: Roshyara, Nab Raj, Kirsten, Holger, Horn, Katrin, Ahnert, Peter, Scholz, Markus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4236550/
https://www.ncbi.nlm.nih.gov/pubmed/25112433
http://dx.doi.org/10.1186/s12863-014-0088-5
_version_ 1782345187787276288
author Roshyara, Nab Raj
Kirsten, Holger
Horn, Katrin
Ahnert, Peter
Scholz, Markus
author_facet Roshyara, Nab Raj
Kirsten, Holger
Horn, Katrin
Ahnert, Peter
Scholz, Markus
author_sort Roshyara, Nab Raj
collection PubMed
description BACKGROUND: Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different strategies of pre-imputation quality filtering on the performance of the widely used imputation algorithms MaCH and IMPUTE. RESULTS: We considered three scenarios: imputation of partially missing genotypes with usage of an external reference panel, without usage of an external reference panel, as well as imputation of completely un-typed SNPs using an external reference panel. We first created various datasets applying different SNP quality filters and masking certain percentages of randomly selected high-quality SNPs. We imputed these SNPs and compared the results between the different filtering scenarios by using established and newly proposed measures of imputation quality. While the established measures assess certainty of imputation results, our newly proposed measures focus on the agreement with true genotypes. These measures showed that pre-imputation SNP-filtering might be detrimental regarding imputation quality. Moreover, the strongest drivers of imputation quality were in general the burden of missingness and the number of SNPs used for imputation. We also found that using a reference panel always improves imputation quality of partially missing genotypes. MaCH performed slightly better than IMPUTE2 in most of our scenarios. Again, these results were more pronounced when using our newly defined measures of imputation quality. CONCLUSION: Even a moderate filtering has a detrimental effect on the imputation quality. Therefore little or no SNP filtering prior to imputation appears to be the best strategy for imputing small to moderately sized datasets. Our results also showed that for these datasets, MaCH performs slightly better than IMPUTE2 in most scenarios at the cost of increased computing time.
format Online
Article
Text
id pubmed-4236550
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42365502014-11-19 Impact of pre-imputation SNP-filtering on genotype imputation results Roshyara, Nab Raj Kirsten, Holger Horn, Katrin Ahnert, Peter Scholz, Markus BMC Genet Research Article BACKGROUND: Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different strategies of pre-imputation quality filtering on the performance of the widely used imputation algorithms MaCH and IMPUTE. RESULTS: We considered three scenarios: imputation of partially missing genotypes with usage of an external reference panel, without usage of an external reference panel, as well as imputation of completely un-typed SNPs using an external reference panel. We first created various datasets applying different SNP quality filters and masking certain percentages of randomly selected high-quality SNPs. We imputed these SNPs and compared the results between the different filtering scenarios by using established and newly proposed measures of imputation quality. While the established measures assess certainty of imputation results, our newly proposed measures focus on the agreement with true genotypes. These measures showed that pre-imputation SNP-filtering might be detrimental regarding imputation quality. Moreover, the strongest drivers of imputation quality were in general the burden of missingness and the number of SNPs used for imputation. We also found that using a reference panel always improves imputation quality of partially missing genotypes. MaCH performed slightly better than IMPUTE2 in most of our scenarios. Again, these results were more pronounced when using our newly defined measures of imputation quality. CONCLUSION: Even a moderate filtering has a detrimental effect on the imputation quality. Therefore little or no SNP filtering prior to imputation appears to be the best strategy for imputing small to moderately sized datasets. Our results also showed that for these datasets, MaCH performs slightly better than IMPUTE2 in most scenarios at the cost of increased computing time. BioMed Central 2014-08-12 /pmc/articles/PMC4236550/ /pubmed/25112433 http://dx.doi.org/10.1186/s12863-014-0088-5 Text en Copyright © 2014 Roshyara et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Roshyara, Nab Raj
Kirsten, Holger
Horn, Katrin
Ahnert, Peter
Scholz, Markus
Impact of pre-imputation SNP-filtering on genotype imputation results
title Impact of pre-imputation SNP-filtering on genotype imputation results
title_full Impact of pre-imputation SNP-filtering on genotype imputation results
title_fullStr Impact of pre-imputation SNP-filtering on genotype imputation results
title_full_unstemmed Impact of pre-imputation SNP-filtering on genotype imputation results
title_short Impact of pre-imputation SNP-filtering on genotype imputation results
title_sort impact of pre-imputation snp-filtering on genotype imputation results
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4236550/
https://www.ncbi.nlm.nih.gov/pubmed/25112433
http://dx.doi.org/10.1186/s12863-014-0088-5
work_keys_str_mv AT roshyaranabraj impactofpreimputationsnpfilteringongenotypeimputationresults
AT kirstenholger impactofpreimputationsnpfilteringongenotypeimputationresults
AT hornkatrin impactofpreimputationsnpfilteringongenotypeimputationresults
AT ahnertpeter impactofpreimputationsnpfilteringongenotypeimputationresults
AT scholzmarkus impactofpreimputationsnpfilteringongenotypeimputationresults