Cargando…

Comparison of multiple imputation algorithms and verification using whole-genome sequencing in the CMUH genetic biobank

A genome-wide association study (GWAS) can be conducted to systematically analyze the contributions of genetic factors to a wide variety of complex diseases. Nevertheless, existing GWASs have provided highly ethnic specific data. Accordingly, to provide data specific to Taiwan, we established a larg...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Ting-Yuan, Lin, Chih-Fan, Wu, Hsing-Tsung, Wu, Ya-Lun, Chen, Yu-Chia, Liao, Chi-Chou, Chou, Yu-Pao, Chao, Dysan, Chang, Ya-Sian, Lu, Hsing-Fang, Chang, Jan-Gowth, Hsu, Kai-Cheng, Tsai, Fuu-Jen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: China Medical University 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8823485/
https://www.ncbi.nlm.nih.gov/pubmed/35223420
http://dx.doi.org/10.37796/2211-8039.1302
_version_ 1784646811738177536
author Liu, Ting-Yuan
Lin, Chih-Fan
Wu, Hsing-Tsung
Wu, Ya-Lun
Chen, Yu-Chia
Liao, Chi-Chou
Chou, Yu-Pao
Chao, Dysan
Chang, Ya-Sian
Lu, Hsing-Fang
Chang, Jan-Gowth
Hsu, Kai-Cheng
Tsai, Fuu-Jen
author_facet Liu, Ting-Yuan
Lin, Chih-Fan
Wu, Hsing-Tsung
Wu, Ya-Lun
Chen, Yu-Chia
Liao, Chi-Chou
Chou, Yu-Pao
Chao, Dysan
Chang, Ya-Sian
Lu, Hsing-Fang
Chang, Jan-Gowth
Hsu, Kai-Cheng
Tsai, Fuu-Jen
author_sort Liu, Ting-Yuan
collection PubMed
description A genome-wide association study (GWAS) can be conducted to systematically analyze the contributions of genetic factors to a wide variety of complex diseases. Nevertheless, existing GWASs have provided highly ethnic specific data. Accordingly, to provide data specific to Taiwan, we established a large-scale genetic database in a single medical institution at the China Medical University Hospital. With current technological limitations, microarray analysis can detect only a limited number of single-nucleotide polymorphisms (SNPs) with a minor allele frequency of >1%. Nevertheless, imputation represents a useful alternative means of expanding data. In this study, we compared four imputation algorithms in terms of various metrics. We observed that among the compared algorithms, Beagle5.2 achieved the fastest calculation speed, smallest storage space, highest specificity, and highest number of high-quality variants. We obtained 15,277,414 high-quality variants in 175,871 people by using Beagle5.2. In our internal verification process, Beagle5.2 exhibited an accuracy rate of up to 98.75%. We also conducted external verification. Our imputed variants had a 79.91% mapping rate and 90.41% accuracy. These results will be combined with clinical data in future research. We have made the results available for researchers to use in formulating imputation algorithms, in addition to establishing a complete SNP database for GWAS and PRS researchers. We believe that these data can help improve overall medical capabilities, particularly precision medicine, in Taiwan.
format Online
Article
Text
id pubmed-8823485
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher China Medical University
record_format MEDLINE/PubMed
spelling pubmed-88234852022-02-25 Comparison of multiple imputation algorithms and verification using whole-genome sequencing in the CMUH genetic biobank Liu, Ting-Yuan Lin, Chih-Fan Wu, Hsing-Tsung Wu, Ya-Lun Chen, Yu-Chia Liao, Chi-Chou Chou, Yu-Pao Chao, Dysan Chang, Ya-Sian Lu, Hsing-Fang Chang, Jan-Gowth Hsu, Kai-Cheng Tsai, Fuu-Jen Biomedicine (Taipei) Original Article A genome-wide association study (GWAS) can be conducted to systematically analyze the contributions of genetic factors to a wide variety of complex diseases. Nevertheless, existing GWASs have provided highly ethnic specific data. Accordingly, to provide data specific to Taiwan, we established a large-scale genetic database in a single medical institution at the China Medical University Hospital. With current technological limitations, microarray analysis can detect only a limited number of single-nucleotide polymorphisms (SNPs) with a minor allele frequency of >1%. Nevertheless, imputation represents a useful alternative means of expanding data. In this study, we compared four imputation algorithms in terms of various metrics. We observed that among the compared algorithms, Beagle5.2 achieved the fastest calculation speed, smallest storage space, highest specificity, and highest number of high-quality variants. We obtained 15,277,414 high-quality variants in 175,871 people by using Beagle5.2. In our internal verification process, Beagle5.2 exhibited an accuracy rate of up to 98.75%. We also conducted external verification. Our imputed variants had a 79.91% mapping rate and 90.41% accuracy. These results will be combined with clinical data in future research. We have made the results available for researchers to use in formulating imputation algorithms, in addition to establishing a complete SNP database for GWAS and PRS researchers. We believe that these data can help improve overall medical capabilities, particularly precision medicine, in Taiwan. China Medical University 2021-12-01 /pmc/articles/PMC8823485/ /pubmed/35223420 http://dx.doi.org/10.37796/2211-8039.1302 Text en © the Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle Original Article
Liu, Ting-Yuan
Lin, Chih-Fan
Wu, Hsing-Tsung
Wu, Ya-Lun
Chen, Yu-Chia
Liao, Chi-Chou
Chou, Yu-Pao
Chao, Dysan
Chang, Ya-Sian
Lu, Hsing-Fang
Chang, Jan-Gowth
Hsu, Kai-Cheng
Tsai, Fuu-Jen
Comparison of multiple imputation algorithms and verification using whole-genome sequencing in the CMUH genetic biobank
title Comparison of multiple imputation algorithms and verification using whole-genome sequencing in the CMUH genetic biobank
title_full Comparison of multiple imputation algorithms and verification using whole-genome sequencing in the CMUH genetic biobank
title_fullStr Comparison of multiple imputation algorithms and verification using whole-genome sequencing in the CMUH genetic biobank
title_full_unstemmed Comparison of multiple imputation algorithms and verification using whole-genome sequencing in the CMUH genetic biobank
title_short Comparison of multiple imputation algorithms and verification using whole-genome sequencing in the CMUH genetic biobank
title_sort comparison of multiple imputation algorithms and verification using whole-genome sequencing in the cmuh genetic biobank
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8823485/
https://www.ncbi.nlm.nih.gov/pubmed/35223420
http://dx.doi.org/10.37796/2211-8039.1302
work_keys_str_mv AT liutingyuan comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT linchihfan comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT wuhsingtsung comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT wuyalun comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT chenyuchia comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT liaochichou comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT chouyupao comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT chaodysan comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT changyasian comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT luhsingfang comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT changjangowth comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT hsukaicheng comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank
AT tsaifuujen comparisonofmultipleimputationalgorithmsandverificationusingwholegenomesequencinginthecmuhgeneticbiobank