Cargando…

GEN2VCF: a converter for human genome imputation output format to VCF format

BACKGROUND: For a genome-wide association study in humans, genotype imputation is an essential analysis tool for improving association mapping power. When IMPUTE software is used for imputation analysis, an imputation output (GEN format) should be converted to variant call format (VCF) with imputed...

Descripción completa

Detalles Bibliográficos
Autores principales: Shin, Dong Mun, Hwang, Mi Yeong, Kim, Bong-Jo, Ryu, Keun Ho, Kim, Young Jin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Singapore 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7497724/
https://www.ncbi.nlm.nih.gov/pubmed/32803703
http://dx.doi.org/10.1007/s13258-020-00982-0
_version_ 1783583376946495488
author Shin, Dong Mun
Hwang, Mi Yeong
Kim, Bong-Jo
Ryu, Keun Ho
Kim, Young Jin
author_facet Shin, Dong Mun
Hwang, Mi Yeong
Kim, Bong-Jo
Ryu, Keun Ho
Kim, Young Jin
author_sort Shin, Dong Mun
collection PubMed
description BACKGROUND: For a genome-wide association study in humans, genotype imputation is an essential analysis tool for improving association mapping power. When IMPUTE software is used for imputation analysis, an imputation output (GEN format) should be converted to variant call format (VCF) with imputed genotype dosage for association analysis. However, the conversion requires multiple software packages in a pipeline with a large amount of processing time. OBJECTIVE: We developed GEN2VCF, a fast and convenient GEN format to VCF conversion tool with dosage support. METHODS: The performance of GEN2VCF was compared to BCFtools, QCTOOL, and Oncofunco. The test data set was a 1 Mb GEN-formatted file of 5000 samples. To determine the performance of various sample sizes, tests were performed from 1000 to 5000 samples with a step size of 1000. Runtime and memory usage were used as performance measures. RESULTS: GEN2VCF showed drastically increased performances with respect to runtime and memory usage. Runtime and memory usage of GEN2VCF was at least 1.4- and 7.4-fold lower compared to other methods, respectively. CONCLUSIONS: GEN2VCF provides users with efficient conversion from GEN format to VCF with the best-guessed genotype, genotype posterior probabilities, and genotype dosage, as well as great flexibility in implementation with other software packages in a pipeline.
format Online
Article
Text
id pubmed-7497724
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer Singapore
record_format MEDLINE/PubMed
spelling pubmed-74977242020-09-28 GEN2VCF: a converter for human genome imputation output format to VCF format Shin, Dong Mun Hwang, Mi Yeong Kim, Bong-Jo Ryu, Keun Ho Kim, Young Jin Genes Genomics Research Article BACKGROUND: For a genome-wide association study in humans, genotype imputation is an essential analysis tool for improving association mapping power. When IMPUTE software is used for imputation analysis, an imputation output (GEN format) should be converted to variant call format (VCF) with imputed genotype dosage for association analysis. However, the conversion requires multiple software packages in a pipeline with a large amount of processing time. OBJECTIVE: We developed GEN2VCF, a fast and convenient GEN format to VCF conversion tool with dosage support. METHODS: The performance of GEN2VCF was compared to BCFtools, QCTOOL, and Oncofunco. The test data set was a 1 Mb GEN-formatted file of 5000 samples. To determine the performance of various sample sizes, tests were performed from 1000 to 5000 samples with a step size of 1000. Runtime and memory usage were used as performance measures. RESULTS: GEN2VCF showed drastically increased performances with respect to runtime and memory usage. Runtime and memory usage of GEN2VCF was at least 1.4- and 7.4-fold lower compared to other methods, respectively. CONCLUSIONS: GEN2VCF provides users with efficient conversion from GEN format to VCF with the best-guessed genotype, genotype posterior probabilities, and genotype dosage, as well as great flexibility in implementation with other software packages in a pipeline. Springer Singapore 2020-08-16 2020 /pmc/articles/PMC7497724/ /pubmed/32803703 http://dx.doi.org/10.1007/s13258-020-00982-0 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Research Article
Shin, Dong Mun
Hwang, Mi Yeong
Kim, Bong-Jo
Ryu, Keun Ho
Kim, Young Jin
GEN2VCF: a converter for human genome imputation output format to VCF format
title GEN2VCF: a converter for human genome imputation output format to VCF format
title_full GEN2VCF: a converter for human genome imputation output format to VCF format
title_fullStr GEN2VCF: a converter for human genome imputation output format to VCF format
title_full_unstemmed GEN2VCF: a converter for human genome imputation output format to VCF format
title_short GEN2VCF: a converter for human genome imputation output format to VCF format
title_sort gen2vcf: a converter for human genome imputation output format to vcf format
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7497724/
https://www.ncbi.nlm.nih.gov/pubmed/32803703
http://dx.doi.org/10.1007/s13258-020-00982-0
work_keys_str_mv AT shindongmun gen2vcfaconverterforhumangenomeimputationoutputformattovcfformat
AT hwangmiyeong gen2vcfaconverterforhumangenomeimputationoutputformattovcfformat
AT kimbongjo gen2vcfaconverterforhumangenomeimputationoutputformattovcfformat
AT ryukeunho gen2vcfaconverterforhumangenomeimputationoutputformattovcfformat
AT kimyoungjin gen2vcfaconverterforhumangenomeimputationoutputformattovcfformat