Cargando…

Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases

High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on...

Descripción completa

Detalles Bibliográficos
Autores principales: Yun, Taedong, Cosentino, Justin, Behsaz, Babak, McCaw, Zachary R., Hill, Davin, Luben, Robert, Lai, Dongbing, Bates, John, Yang, Howard, Schwantes-An, Tae-Hwi, Zhou, Yuchen, Khawaja, Anthony P., Carroll, Andrew, Hobbs, Brian D., Cho, Michael H., McLean, Cory Y., Hormozdiari, Farhad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168505/
https://www.ncbi.nlm.nih.gov/pubmed/37163049
http://dx.doi.org/10.1101/2023.04.28.23289285
_version_ 1785038867941818368
author Yun, Taedong
Cosentino, Justin
Behsaz, Babak
McCaw, Zachary R.
Hill, Davin
Luben, Robert
Lai, Dongbing
Bates, John
Yang, Howard
Schwantes-An, Tae-Hwi
Zhou, Yuchen
Khawaja, Anthony P.
Carroll, Andrew
Hobbs, Brian D.
Cho, Michael H.
McLean, Cory Y.
Hormozdiari, Farhad
author_facet Yun, Taedong
Cosentino, Justin
Behsaz, Babak
McCaw, Zachary R.
Hill, Davin
Luben, Robert
Lai, Dongbing
Bates, John
Yang, Howard
Schwantes-An, Tae-Hwi
Zhou, Yuchen
Khawaja, Anthony P.
Carroll, Andrew
Hobbs, Brian D.
Cho, Michael H.
McLean, Cory Y.
Hormozdiari, Farhad
author_sort Yun, Taedong
collection PubMed
description High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discovering associations between genetic variants and high-dimensional clinical data. REGLE uses convolutional variational autoencoders to compute a non-linear, low-dimensional, disentangled embedding of the data with highly heritable individual components. REGLE can incorporate expert-defined or clinical features and provides a framework to create accurate disease-specific polygenic risk scores (PRS) in datasets which have minimal expert phenotyping. We apply REGLE to both respiratory and circulatory systems: spirograms which measure lung function and photoplethysmograms (PPG) which measure blood volume changes. Genome-wide association studies on REGLE embeddings identify more genome-wide significant loci than existing methods and replicate known loci for both spirograms and PPG, demonstrating the generality of the framework. Furthermore, these embeddings are associated with overall survival. Finally, we construct a set of PRSs that improve predictive performance of asthma, chronic obstructive pulmonary disease, hypertension, and systolic blood pressure in multiple biobanks. Thus, REGLE embeddings can quantify clinically relevant features that are not currently captured in a standardized or automated way.
format Online
Article
Text
id pubmed-10168505
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-101685052023-05-10 Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases Yun, Taedong Cosentino, Justin Behsaz, Babak McCaw, Zachary R. Hill, Davin Luben, Robert Lai, Dongbing Bates, John Yang, Howard Schwantes-An, Tae-Hwi Zhou, Yuchen Khawaja, Anthony P. Carroll, Andrew Hobbs, Brian D. Cho, Michael H. McLean, Cory Y. Hormozdiari, Farhad medRxiv Article High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discovering associations between genetic variants and high-dimensional clinical data. REGLE uses convolutional variational autoencoders to compute a non-linear, low-dimensional, disentangled embedding of the data with highly heritable individual components. REGLE can incorporate expert-defined or clinical features and provides a framework to create accurate disease-specific polygenic risk scores (PRS) in datasets which have minimal expert phenotyping. We apply REGLE to both respiratory and circulatory systems: spirograms which measure lung function and photoplethysmograms (PPG) which measure blood volume changes. Genome-wide association studies on REGLE embeddings identify more genome-wide significant loci than existing methods and replicate known loci for both spirograms and PPG, demonstrating the generality of the framework. Furthermore, these embeddings are associated with overall survival. Finally, we construct a set of PRSs that improve predictive performance of asthma, chronic obstructive pulmonary disease, hypertension, and systolic blood pressure in multiple biobanks. Thus, REGLE embeddings can quantify clinically relevant features that are not currently captured in a standardized or automated way. Cold Spring Harbor Laboratory 2023-08-29 /pmc/articles/PMC10168505/ /pubmed/37163049 http://dx.doi.org/10.1101/2023.04.28.23289285 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Yun, Taedong
Cosentino, Justin
Behsaz, Babak
McCaw, Zachary R.
Hill, Davin
Luben, Robert
Lai, Dongbing
Bates, John
Yang, Howard
Schwantes-An, Tae-Hwi
Zhou, Yuchen
Khawaja, Anthony P.
Carroll, Andrew
Hobbs, Brian D.
Cho, Michael H.
McLean, Cory Y.
Hormozdiari, Farhad
Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases
title Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases
title_full Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases
title_fullStr Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases
title_full_unstemmed Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases
title_short Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases
title_sort unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168505/
https://www.ncbi.nlm.nih.gov/pubmed/37163049
http://dx.doi.org/10.1101/2023.04.28.23289285
work_keys_str_mv AT yuntaedong unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT cosentinojustin unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT behsazbabak unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT mccawzacharyr unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT hilldavin unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT lubenrobert unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT laidongbing unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT batesjohn unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT yanghoward unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT schwantesantaehwi unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT zhouyuchen unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT khawajaanthonyp unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT carrollandrew unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT hobbsbriand unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT chomichaelh unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT mcleancoryy unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases
AT hormozdiarifarhad unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases