Cargando…
Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases
High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on...
Autores principales: | , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168505/ https://www.ncbi.nlm.nih.gov/pubmed/37163049 http://dx.doi.org/10.1101/2023.04.28.23289285 |
_version_ | 1785038867941818368 |
---|---|
author | Yun, Taedong Cosentino, Justin Behsaz, Babak McCaw, Zachary R. Hill, Davin Luben, Robert Lai, Dongbing Bates, John Yang, Howard Schwantes-An, Tae-Hwi Zhou, Yuchen Khawaja, Anthony P. Carroll, Andrew Hobbs, Brian D. Cho, Michael H. McLean, Cory Y. Hormozdiari, Farhad |
author_facet | Yun, Taedong Cosentino, Justin Behsaz, Babak McCaw, Zachary R. Hill, Davin Luben, Robert Lai, Dongbing Bates, John Yang, Howard Schwantes-An, Tae-Hwi Zhou, Yuchen Khawaja, Anthony P. Carroll, Andrew Hobbs, Brian D. Cho, Michael H. McLean, Cory Y. Hormozdiari, Farhad |
author_sort | Yun, Taedong |
collection | PubMed |
description | High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discovering associations between genetic variants and high-dimensional clinical data. REGLE uses convolutional variational autoencoders to compute a non-linear, low-dimensional, disentangled embedding of the data with highly heritable individual components. REGLE can incorporate expert-defined or clinical features and provides a framework to create accurate disease-specific polygenic risk scores (PRS) in datasets which have minimal expert phenotyping. We apply REGLE to both respiratory and circulatory systems: spirograms which measure lung function and photoplethysmograms (PPG) which measure blood volume changes. Genome-wide association studies on REGLE embeddings identify more genome-wide significant loci than existing methods and replicate known loci for both spirograms and PPG, demonstrating the generality of the framework. Furthermore, these embeddings are associated with overall survival. Finally, we construct a set of PRSs that improve predictive performance of asthma, chronic obstructive pulmonary disease, hypertension, and systolic blood pressure in multiple biobanks. Thus, REGLE embeddings can quantify clinically relevant features that are not currently captured in a standardized or automated way. |
format | Online Article Text |
id | pubmed-10168505 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-101685052023-05-10 Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases Yun, Taedong Cosentino, Justin Behsaz, Babak McCaw, Zachary R. Hill, Davin Luben, Robert Lai, Dongbing Bates, John Yang, Howard Schwantes-An, Tae-Hwi Zhou, Yuchen Khawaja, Anthony P. Carroll, Andrew Hobbs, Brian D. Cho, Michael H. McLean, Cory Y. Hormozdiari, Farhad medRxiv Article High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discovering associations between genetic variants and high-dimensional clinical data. REGLE uses convolutional variational autoencoders to compute a non-linear, low-dimensional, disentangled embedding of the data with highly heritable individual components. REGLE can incorporate expert-defined or clinical features and provides a framework to create accurate disease-specific polygenic risk scores (PRS) in datasets which have minimal expert phenotyping. We apply REGLE to both respiratory and circulatory systems: spirograms which measure lung function and photoplethysmograms (PPG) which measure blood volume changes. Genome-wide association studies on REGLE embeddings identify more genome-wide significant loci than existing methods and replicate known loci for both spirograms and PPG, demonstrating the generality of the framework. Furthermore, these embeddings are associated with overall survival. Finally, we construct a set of PRSs that improve predictive performance of asthma, chronic obstructive pulmonary disease, hypertension, and systolic blood pressure in multiple biobanks. Thus, REGLE embeddings can quantify clinically relevant features that are not currently captured in a standardized or automated way. Cold Spring Harbor Laboratory 2023-08-29 /pmc/articles/PMC10168505/ /pubmed/37163049 http://dx.doi.org/10.1101/2023.04.28.23289285 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Yun, Taedong Cosentino, Justin Behsaz, Babak McCaw, Zachary R. Hill, Davin Luben, Robert Lai, Dongbing Bates, John Yang, Howard Schwantes-An, Tae-Hwi Zhou, Yuchen Khawaja, Anthony P. Carroll, Andrew Hobbs, Brian D. Cho, Michael H. McLean, Cory Y. Hormozdiari, Farhad Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases |
title | Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases |
title_full | Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases |
title_fullStr | Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases |
title_full_unstemmed | Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases |
title_short | Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases |
title_sort | unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168505/ https://www.ncbi.nlm.nih.gov/pubmed/37163049 http://dx.doi.org/10.1101/2023.04.28.23289285 |
work_keys_str_mv | AT yuntaedong unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT cosentinojustin unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT behsazbabak unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT mccawzacharyr unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT hilldavin unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT lubenrobert unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT laidongbing unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT batesjohn unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT yanghoward unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT schwantesantaehwi unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT zhouyuchen unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT khawajaanthonyp unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT carrollandrew unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT hobbsbriand unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT chomichaelh unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT mcleancoryy unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases AT hormozdiarifarhad unsupervisedrepresentationlearningimprovesgenomicdiscoveryandriskpredictionforrespiratoryandcirculatoryfunctionsanddiseases |