Cargando…
Fast estimation of genetic correlation for biobank-scale data
Genetic correlation is an important parameter in efforts to understand the relationships among complex traits. Current methods that analyze individual genotype data for estimating genetic correlation are challenging to scale to large datasets. Methods that analyze summary data, while being computati...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8764132/ https://www.ncbi.nlm.nih.gov/pubmed/34861179 http://dx.doi.org/10.1016/j.ajhg.2021.11.015 |
_version_ | 1784634097260298240 |
---|---|
author | Wu, Yue Burch, Kathryn S. Ganna, Andrea Pajukanta, Päivi Pasaniuc, Bogdan Sankararaman, Sriram |
author_facet | Wu, Yue Burch, Kathryn S. Ganna, Andrea Pajukanta, Päivi Pasaniuc, Bogdan Sankararaman, Sriram |
author_sort | Wu, Yue |
collection | PubMed |
description | Genetic correlation is an important parameter in efforts to understand the relationships among complex traits. Current methods that analyze individual genotype data for estimating genetic correlation are challenging to scale to large datasets. Methods that analyze summary data, while being computationally efficient, tend to yield estimates of genetic correlation with reduced precision. We propose SCORE (scalable genetic correlation estimator), a randomized method of moments estimator of genetic correlation that is both scalable and accurate. SCORE obtains more precise estimates of genetic correlations relative to summary-statistic methods that can be applied at scale; it achieves a [Formula: see text] reduction in standard error relative to LD-score regression (LDSC) and a [Formula: see text] reduction relative to high-definition likelihood (HDL) (averaged over all simulations). The efficiency of SCORE enables computation of genetic correlations on the UK Biobank dataset, consisting of [Formula: see text] K individuals and [Formula: see text] K SNPs, in a few h (orders of magnitude faster than methods that analyze individual data, such as GCTA). Across 780 pairs of traits in [Formula: see text] unrelated white British individuals in the UK Biobank, SCORE identifies significant genetic correlation between 200 additional pairs of traits over LDSC (beyond the 245 pairs identified by both). |
format | Online Article Text |
id | pubmed-8764132 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-87641322022-01-20 Fast estimation of genetic correlation for biobank-scale data Wu, Yue Burch, Kathryn S. Ganna, Andrea Pajukanta, Päivi Pasaniuc, Bogdan Sankararaman, Sriram Am J Hum Genet Article Genetic correlation is an important parameter in efforts to understand the relationships among complex traits. Current methods that analyze individual genotype data for estimating genetic correlation are challenging to scale to large datasets. Methods that analyze summary data, while being computationally efficient, tend to yield estimates of genetic correlation with reduced precision. We propose SCORE (scalable genetic correlation estimator), a randomized method of moments estimator of genetic correlation that is both scalable and accurate. SCORE obtains more precise estimates of genetic correlations relative to summary-statistic methods that can be applied at scale; it achieves a [Formula: see text] reduction in standard error relative to LD-score regression (LDSC) and a [Formula: see text] reduction relative to high-definition likelihood (HDL) (averaged over all simulations). The efficiency of SCORE enables computation of genetic correlations on the UK Biobank dataset, consisting of [Formula: see text] K individuals and [Formula: see text] K SNPs, in a few h (orders of magnitude faster than methods that analyze individual data, such as GCTA). Across 780 pairs of traits in [Formula: see text] unrelated white British individuals in the UK Biobank, SCORE identifies significant genetic correlation between 200 additional pairs of traits over LDSC (beyond the 245 pairs identified by both). Elsevier 2022-01-06 2021-12-02 /pmc/articles/PMC8764132/ /pubmed/34861179 http://dx.doi.org/10.1016/j.ajhg.2021.11.015 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Wu, Yue Burch, Kathryn S. Ganna, Andrea Pajukanta, Päivi Pasaniuc, Bogdan Sankararaman, Sriram Fast estimation of genetic correlation for biobank-scale data |
title | Fast estimation of genetic correlation for biobank-scale data |
title_full | Fast estimation of genetic correlation for biobank-scale data |
title_fullStr | Fast estimation of genetic correlation for biobank-scale data |
title_full_unstemmed | Fast estimation of genetic correlation for biobank-scale data |
title_short | Fast estimation of genetic correlation for biobank-scale data |
title_sort | fast estimation of genetic correlation for biobank-scale data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8764132/ https://www.ncbi.nlm.nih.gov/pubmed/34861179 http://dx.doi.org/10.1016/j.ajhg.2021.11.015 |
work_keys_str_mv | AT wuyue fastestimationofgeneticcorrelationforbiobankscaledata AT burchkathryns fastestimationofgeneticcorrelationforbiobankscaledata AT gannaandrea fastestimationofgeneticcorrelationforbiobankscaledata AT pajukantapaivi fastestimationofgeneticcorrelationforbiobankscaledata AT pasaniucbogdan fastestimationofgeneticcorrelationforbiobankscaledata AT sankararamansriram fastestimationofgeneticcorrelationforbiobankscaledata |