Cargando…

An autoencoder-based deep learning method for genotype imputation

Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) b...

Descripción completa

Detalles Bibliográficos
Autores principales:	Song, Meng, Greenbaum, Jonathan, Luttrell, Joseph, Zhou, Weihua, Wu, Chong, Luo, Zhe, Qiu, Chuan, Zhao, Lan Juan, Su, Kuan-Jui, Tian, Qing, Shen, Hui, Hong, Huixiao, Gong, Ping, Shi, Xinghua, Deng, Hong-Wen, Zhang, Chaoyang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9671213/ https://www.ncbi.nlm.nih.gov/pubmed/36406474 http://dx.doi.org/10.3389/frai.2022.1028978

_version_	1784832492476301312
author	Song, Meng Greenbaum, Jonathan Luttrell, Joseph Zhou, Weihua Wu, Chong Luo, Zhe Qiu, Chuan Zhao, Lan Juan Su, Kuan-Jui Tian, Qing Shen, Hui Hong, Huixiao Gong, Ping Shi, Xinghua Deng, Hong-Wen Zhang, Chaoyang
author_facet	Song, Meng Greenbaum, Jonathan Luttrell, Joseph Zhou, Weihua Wu, Chong Luo, Zhe Qiu, Chuan Zhao, Lan Juan Su, Kuan-Jui Tian, Qing Shen, Hui Hong, Huixiao Gong, Ping Shi, Xinghua Deng, Hong-Wen Zhang, Chaoyang
author_sort	Song, Meng
collection	PubMed
description	Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses.
format	Online Article Text
id	pubmed-9671213
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-96712132022-11-18 An autoencoder-based deep learning method for genotype imputation Song, Meng Greenbaum, Jonathan Luttrell, Joseph Zhou, Weihua Wu, Chong Luo, Zhe Qiu, Chuan Zhao, Lan Juan Su, Kuan-Jui Tian, Qing Shen, Hui Hong, Huixiao Gong, Ping Shi, Xinghua Deng, Hong-Wen Zhang, Chaoyang Front Artif Intell Artificial Intelligence Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses. Frontiers Media S.A. 2022-11-03 /pmc/articles/PMC9671213/ /pubmed/36406474 http://dx.doi.org/10.3389/frai.2022.1028978 Text en Copyright © 2022 Song, Greenbaum, Luttrell, Zhou, Wu, Luo, Qiu, Zhao, Su, Tian, Shen, Hong, Gong, Shi, Deng and Zhang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Artificial Intelligence Song, Meng Greenbaum, Jonathan Luttrell, Joseph Zhou, Weihua Wu, Chong Luo, Zhe Qiu, Chuan Zhao, Lan Juan Su, Kuan-Jui Tian, Qing Shen, Hui Hong, Huixiao Gong, Ping Shi, Xinghua Deng, Hong-Wen Zhang, Chaoyang An autoencoder-based deep learning method for genotype imputation
title	An autoencoder-based deep learning method for genotype imputation
title_full	An autoencoder-based deep learning method for genotype imputation
title_fullStr	An autoencoder-based deep learning method for genotype imputation
title_full_unstemmed	An autoencoder-based deep learning method for genotype imputation
title_short	An autoencoder-based deep learning method for genotype imputation
title_sort	autoencoder-based deep learning method for genotype imputation
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9671213/ https://www.ncbi.nlm.nih.gov/pubmed/36406474 http://dx.doi.org/10.3389/frai.2022.1028978
work_keys_str_mv	AT songmeng anautoencoderbaseddeeplearningmethodforgenotypeimputation AT greenbaumjonathan anautoencoderbaseddeeplearningmethodforgenotypeimputation AT luttrelljoseph anautoencoderbaseddeeplearningmethodforgenotypeimputation AT zhouweihua anautoencoderbaseddeeplearningmethodforgenotypeimputation AT wuchong anautoencoderbaseddeeplearningmethodforgenotypeimputation AT luozhe anautoencoderbaseddeeplearningmethodforgenotypeimputation AT qiuchuan anautoencoderbaseddeeplearningmethodforgenotypeimputation AT zhaolanjuan anautoencoderbaseddeeplearningmethodforgenotypeimputation AT sukuanjui anautoencoderbaseddeeplearningmethodforgenotypeimputation AT tianqing anautoencoderbaseddeeplearningmethodforgenotypeimputation AT shenhui anautoencoderbaseddeeplearningmethodforgenotypeimputation AT honghuixiao anautoencoderbaseddeeplearningmethodforgenotypeimputation AT gongping anautoencoderbaseddeeplearningmethodforgenotypeimputation AT shixinghua anautoencoderbaseddeeplearningmethodforgenotypeimputation AT denghongwen anautoencoderbaseddeeplearningmethodforgenotypeimputation AT zhangchaoyang anautoencoderbaseddeeplearningmethodforgenotypeimputation AT songmeng autoencoderbaseddeeplearningmethodforgenotypeimputation AT greenbaumjonathan autoencoderbaseddeeplearningmethodforgenotypeimputation AT luttrelljoseph autoencoderbaseddeeplearningmethodforgenotypeimputation AT zhouweihua autoencoderbaseddeeplearningmethodforgenotypeimputation AT wuchong autoencoderbaseddeeplearningmethodforgenotypeimputation AT luozhe autoencoderbaseddeeplearningmethodforgenotypeimputation AT qiuchuan autoencoderbaseddeeplearningmethodforgenotypeimputation AT zhaolanjuan autoencoderbaseddeeplearningmethodforgenotypeimputation AT sukuanjui autoencoderbaseddeeplearningmethodforgenotypeimputation AT tianqing autoencoderbaseddeeplearningmethodforgenotypeimputation AT shenhui autoencoderbaseddeeplearningmethodforgenotypeimputation AT honghuixiao autoencoderbaseddeeplearningmethodforgenotypeimputation AT gongping autoencoderbaseddeeplearningmethodforgenotypeimputation AT shixinghua autoencoderbaseddeeplearningmethodforgenotypeimputation AT denghongwen autoencoderbaseddeeplearningmethodforgenotypeimputation AT zhangchaoyang autoencoderbaseddeeplearningmethodforgenotypeimputation

An autoencoder-based deep learning method for genotype imputation

Ejemplares similares