Cargando…

imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters

BACKGROUND: The imputation of genotypes increases the power of genome-wide association studies. However, the imputation quality should be assessed in each particular case. Nevertheless, not all imputation softwares control the error of output, e.g., the last release of fastPHASE program (1.4.8) lack...

Descripción completa

Detalles Bibliográficos
Autores principales: Khvorykh, Gennady V., Khrunin, Andrey V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7379353/
https://www.ncbi.nlm.nih.gov/pubmed/32703240
http://dx.doi.org/10.1186/s12859-020-03589-0
_version_ 1783562621004283904
author Khvorykh, Gennady V.
Khrunin, Andrey V.
author_facet Khvorykh, Gennady V.
Khrunin, Andrey V.
author_sort Khvorykh, Gennady V.
collection PubMed
description BACKGROUND: The imputation of genotypes increases the power of genome-wide association studies. However, the imputation quality should be assessed in each particular case. Nevertheless, not all imputation softwares control the error of output, e.g., the last release of fastPHASE program (1.4.8) lacks such an option. In this particular software there is also an uncertainty in choosing the model parameters. fastPHASE is based on haplotype clusters, which size should be set a priori. The parameter influences the results of imputation and downstream analysis. RESULTS: We present a software toolkit imputeqc to assess the imputation quality and/or to choose the model parameters for imputation. We demonstrate the efficacy of toolkit for evaluation of imputations made with both fastPHASE and BEAGLE software for HapMap and 1000 Genomes data. The discordance of genotypes received correlated well in both methods. Using imputeqc, we also shown how to choose the optimal number of haplotype clusters and expectation-maximization cycles for fastPHASE program. The found number of haplotype clusters of 25 was further applied for hapFLK testing that revealed signatures of selection at LCT region on chromosome 2. We also demonstrated how to decrease the computational time in the case of hapFLK testing from 3 days to 20 h. CONCLUSIONS: The toolkit is implemented as an R package imputeqc and command line scripts. The code is freely available at https://github.com/inzilico/imputeqcunder the MIT license.
format Online
Article
Text
id pubmed-7379353
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73793532020-08-04 imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters Khvorykh, Gennady V. Khrunin, Andrey V. BMC Bioinformatics Software BACKGROUND: The imputation of genotypes increases the power of genome-wide association studies. However, the imputation quality should be assessed in each particular case. Nevertheless, not all imputation softwares control the error of output, e.g., the last release of fastPHASE program (1.4.8) lacks such an option. In this particular software there is also an uncertainty in choosing the model parameters. fastPHASE is based on haplotype clusters, which size should be set a priori. The parameter influences the results of imputation and downstream analysis. RESULTS: We present a software toolkit imputeqc to assess the imputation quality and/or to choose the model parameters for imputation. We demonstrate the efficacy of toolkit for evaluation of imputations made with both fastPHASE and BEAGLE software for HapMap and 1000 Genomes data. The discordance of genotypes received correlated well in both methods. Using imputeqc, we also shown how to choose the optimal number of haplotype clusters and expectation-maximization cycles for fastPHASE program. The found number of haplotype clusters of 25 was further applied for hapFLK testing that revealed signatures of selection at LCT region on chromosome 2. We also demonstrated how to decrease the computational time in the case of hapFLK testing from 3 days to 20 h. CONCLUSIONS: The toolkit is implemented as an R package imputeqc and command line scripts. The code is freely available at https://github.com/inzilico/imputeqcunder the MIT license. BioMed Central 2020-07-24 /pmc/articles/PMC7379353/ /pubmed/32703240 http://dx.doi.org/10.1186/s12859-020-03589-0 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Khvorykh, Gennady V.
Khrunin, Andrey V.
imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters
title imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters
title_full imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters
title_fullStr imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters
title_full_unstemmed imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters
title_short imputeqc: an R package for assessing imputation quality of genotypes and optimizing imputation parameters
title_sort imputeqc: an r package for assessing imputation quality of genotypes and optimizing imputation parameters
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7379353/
https://www.ncbi.nlm.nih.gov/pubmed/32703240
http://dx.doi.org/10.1186/s12859-020-03589-0
work_keys_str_mv AT khvorykhgennadyv imputeqcanrpackageforassessingimputationqualityofgenotypesandoptimizingimputationparameters
AT khruninandreyv imputeqcanrpackageforassessingimputationqualityofgenotypesandoptimizingimputationparameters