Cargando…

Building an optimal predictive model for imputing tissue-specific gene expression by combining genotype and whole-blood transcriptome data

Accurate imputation of tissue-specific gene expression can be a powerful tool for understanding the biological mechanisms underlying human complex traits. Existing imputation methods can be grouped into two categories according to the types of predictors used. The first category uses genotype data,...

Descripción completa

Detalles Bibliográficos
Autores principales: Jung, Sunwoo, Lee, Cue Hyunkyu, Sul, Jae Hoon, Han, Buhm
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10413136/
https://www.ncbi.nlm.nih.gov/pubmed/37576186
http://dx.doi.org/10.1016/j.xhgg.2023.100223
_version_ 1785087070669111296
author Jung, Sunwoo
Lee, Cue Hyunkyu
Sul, Jae Hoon
Han, Buhm
author_facet Jung, Sunwoo
Lee, Cue Hyunkyu
Sul, Jae Hoon
Han, Buhm
author_sort Jung, Sunwoo
collection PubMed
description Accurate imputation of tissue-specific gene expression can be a powerful tool for understanding the biological mechanisms underlying human complex traits. Existing imputation methods can be grouped into two categories according to the types of predictors used. The first category uses genotype data, while the second category uses whole-blood expression data. Both data types can be easily collected from blood, avoiding invasive tissue biopsies. In this study, we attempted to build an optimal predictive model for imputing tissue-specific gene expression by combining the genotype and whole-blood expression data. We first evaluated the imputation performance of each standalone model (using genotype data [GEN model] and using whole-blood expression data [WBE model]) using their respective data types across 47 human tissues. The WBE model outperformed the GEN model in most tissues by a large gain. Then, we developed several combined models that leverage both types of predictors to further improve imputation performance. We tried various strategies, including utilizing a merged dataset of the two data types (MERGED models) and integrating the imputation outcomes of the two standalone models (inverse variance-weighted [IVW] models). We found that one of the MERGED models noticeably outperformed the standalone models. This model involved a fixed ratio between the two regularization penalty factors for the two predictor types so that the contribution of the whole-blood transcriptome is upweighted compared with the genotype. Our study suggests that one can improve the imputation of tissue-specific gene expression by combining the genotype and whole-blood expression, but the improvement can be largely dependent on the combination strategy chosen.
format Online
Article
Text
id pubmed-10413136
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-104131362023-08-11 Building an optimal predictive model for imputing tissue-specific gene expression by combining genotype and whole-blood transcriptome data Jung, Sunwoo Lee, Cue Hyunkyu Sul, Jae Hoon Han, Buhm HGG Adv Article Accurate imputation of tissue-specific gene expression can be a powerful tool for understanding the biological mechanisms underlying human complex traits. Existing imputation methods can be grouped into two categories according to the types of predictors used. The first category uses genotype data, while the second category uses whole-blood expression data. Both data types can be easily collected from blood, avoiding invasive tissue biopsies. In this study, we attempted to build an optimal predictive model for imputing tissue-specific gene expression by combining the genotype and whole-blood expression data. We first evaluated the imputation performance of each standalone model (using genotype data [GEN model] and using whole-blood expression data [WBE model]) using their respective data types across 47 human tissues. The WBE model outperformed the GEN model in most tissues by a large gain. Then, we developed several combined models that leverage both types of predictors to further improve imputation performance. We tried various strategies, including utilizing a merged dataset of the two data types (MERGED models) and integrating the imputation outcomes of the two standalone models (inverse variance-weighted [IVW] models). We found that one of the MERGED models noticeably outperformed the standalone models. This model involved a fixed ratio between the two regularization penalty factors for the two predictor types so that the contribution of the whole-blood transcriptome is upweighted compared with the genotype. Our study suggests that one can improve the imputation of tissue-specific gene expression by combining the genotype and whole-blood expression, but the improvement can be largely dependent on the combination strategy chosen. Elsevier 2023-07-11 /pmc/articles/PMC10413136/ /pubmed/37576186 http://dx.doi.org/10.1016/j.xhgg.2023.100223 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Jung, Sunwoo
Lee, Cue Hyunkyu
Sul, Jae Hoon
Han, Buhm
Building an optimal predictive model for imputing tissue-specific gene expression by combining genotype and whole-blood transcriptome data
title Building an optimal predictive model for imputing tissue-specific gene expression by combining genotype and whole-blood transcriptome data
title_full Building an optimal predictive model for imputing tissue-specific gene expression by combining genotype and whole-blood transcriptome data
title_fullStr Building an optimal predictive model for imputing tissue-specific gene expression by combining genotype and whole-blood transcriptome data
title_full_unstemmed Building an optimal predictive model for imputing tissue-specific gene expression by combining genotype and whole-blood transcriptome data
title_short Building an optimal predictive model for imputing tissue-specific gene expression by combining genotype and whole-blood transcriptome data
title_sort building an optimal predictive model for imputing tissue-specific gene expression by combining genotype and whole-blood transcriptome data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10413136/
https://www.ncbi.nlm.nih.gov/pubmed/37576186
http://dx.doi.org/10.1016/j.xhgg.2023.100223
work_keys_str_mv AT jungsunwoo buildinganoptimalpredictivemodelforimputingtissuespecificgeneexpressionbycombininggenotypeandwholebloodtranscriptomedata
AT leecuehyunkyu buildinganoptimalpredictivemodelforimputingtissuespecificgeneexpressionbycombininggenotypeandwholebloodtranscriptomedata
AT suljaehoon buildinganoptimalpredictivemodelforimputingtissuespecificgeneexpressionbycombininggenotypeandwholebloodtranscriptomedata
AT hanbuhm buildinganoptimalpredictivemodelforimputingtissuespecificgeneexpressionbycombininggenotypeandwholebloodtranscriptomedata