Cargando…

An autoencoder-based deep learning method for genotype imputation

Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) b...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Meng, Greenbaum, Jonathan, Luttrell, Joseph, Zhou, Weihua, Wu, Chong, Luo, Zhe, Qiu, Chuan, Zhao, Lan Juan, Su, Kuan-Jui, Tian, Qing, Shen, Hui, Hong, Huixiao, Gong, Ping, Shi, Xinghua, Deng, Hong-Wen, Zhang, Chaoyang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9671213/
https://www.ncbi.nlm.nih.gov/pubmed/36406474
http://dx.doi.org/10.3389/frai.2022.1028978
_version_ 1784832492476301312
author Song, Meng
Greenbaum, Jonathan
Luttrell, Joseph
Zhou, Weihua
Wu, Chong
Luo, Zhe
Qiu, Chuan
Zhao, Lan Juan
Su, Kuan-Jui
Tian, Qing
Shen, Hui
Hong, Huixiao
Gong, Ping
Shi, Xinghua
Deng, Hong-Wen
Zhang, Chaoyang
author_facet Song, Meng
Greenbaum, Jonathan
Luttrell, Joseph
Zhou, Weihua
Wu, Chong
Luo, Zhe
Qiu, Chuan
Zhao, Lan Juan
Su, Kuan-Jui
Tian, Qing
Shen, Hui
Hong, Huixiao
Gong, Ping
Shi, Xinghua
Deng, Hong-Wen
Zhang, Chaoyang
author_sort Song, Meng
collection PubMed
description Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses.
format Online
Article
Text
id pubmed-9671213
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-96712132022-11-18 An autoencoder-based deep learning method for genotype imputation Song, Meng Greenbaum, Jonathan Luttrell, Joseph Zhou, Weihua Wu, Chong Luo, Zhe Qiu, Chuan Zhao, Lan Juan Su, Kuan-Jui Tian, Qing Shen, Hui Hong, Huixiao Gong, Ping Shi, Xinghua Deng, Hong-Wen Zhang, Chaoyang Front Artif Intell Artificial Intelligence Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses. Frontiers Media S.A. 2022-11-03 /pmc/articles/PMC9671213/ /pubmed/36406474 http://dx.doi.org/10.3389/frai.2022.1028978 Text en Copyright © 2022 Song, Greenbaum, Luttrell, Zhou, Wu, Luo, Qiu, Zhao, Su, Tian, Shen, Hong, Gong, Shi, Deng and Zhang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Song, Meng
Greenbaum, Jonathan
Luttrell, Joseph
Zhou, Weihua
Wu, Chong
Luo, Zhe
Qiu, Chuan
Zhao, Lan Juan
Su, Kuan-Jui
Tian, Qing
Shen, Hui
Hong, Huixiao
Gong, Ping
Shi, Xinghua
Deng, Hong-Wen
Zhang, Chaoyang
An autoencoder-based deep learning method for genotype imputation
title An autoencoder-based deep learning method for genotype imputation
title_full An autoencoder-based deep learning method for genotype imputation
title_fullStr An autoencoder-based deep learning method for genotype imputation
title_full_unstemmed An autoencoder-based deep learning method for genotype imputation
title_short An autoencoder-based deep learning method for genotype imputation
title_sort autoencoder-based deep learning method for genotype imputation
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9671213/
https://www.ncbi.nlm.nih.gov/pubmed/36406474
http://dx.doi.org/10.3389/frai.2022.1028978
work_keys_str_mv AT songmeng anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT greenbaumjonathan anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT luttrelljoseph anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT zhouweihua anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT wuchong anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT luozhe anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT qiuchuan anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT zhaolanjuan anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT sukuanjui anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT tianqing anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT shenhui anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT honghuixiao anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT gongping anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT shixinghua anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT denghongwen anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT zhangchaoyang anautoencoderbaseddeeplearningmethodforgenotypeimputation
AT songmeng autoencoderbaseddeeplearningmethodforgenotypeimputation
AT greenbaumjonathan autoencoderbaseddeeplearningmethodforgenotypeimputation
AT luttrelljoseph autoencoderbaseddeeplearningmethodforgenotypeimputation
AT zhouweihua autoencoderbaseddeeplearningmethodforgenotypeimputation
AT wuchong autoencoderbaseddeeplearningmethodforgenotypeimputation
AT luozhe autoencoderbaseddeeplearningmethodforgenotypeimputation
AT qiuchuan autoencoderbaseddeeplearningmethodforgenotypeimputation
AT zhaolanjuan autoencoderbaseddeeplearningmethodforgenotypeimputation
AT sukuanjui autoencoderbaseddeeplearningmethodforgenotypeimputation
AT tianqing autoencoderbaseddeeplearningmethodforgenotypeimputation
AT shenhui autoencoderbaseddeeplearningmethodforgenotypeimputation
AT honghuixiao autoencoderbaseddeeplearningmethodforgenotypeimputation
AT gongping autoencoderbaseddeeplearningmethodforgenotypeimputation
AT shixinghua autoencoderbaseddeeplearningmethodforgenotypeimputation
AT denghongwen autoencoderbaseddeeplearningmethodforgenotypeimputation
AT zhangchaoyang autoencoderbaseddeeplearningmethodforgenotypeimputation