Cargando…
Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks
The popularization of biobanks provides an unprecedented amount of genetic and phenotypic information that can be used to research the relationship between genetics and human health. Despite the opportunities these datasets provide, they also pose many problems associated with computational time and...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907735/ https://www.ncbi.nlm.nih.gov/pubmed/31797641 |
_version_ | 1783478586017054720 |
---|---|
author | Wolf, Jack M. Barnard, Martha Xia, Xueting Ryder, Nathan Westra, Jason Tintle, Nathan |
author_facet | Wolf, Jack M. Barnard, Martha Xia, Xueting Ryder, Nathan Westra, Jason Tintle, Nathan |
author_sort | Wolf, Jack M. |
collection | PubMed |
description | The popularization of biobanks provides an unprecedented amount of genetic and phenotypic information that can be used to research the relationship between genetics and human health. Despite the opportunities these datasets provide, they also pose many problems associated with computational time and costs, data size and transfer, and privacy and security. The publishing of summary statistics from these biobanks, and the use of them in a variety of downstream statistical analyses, alleviates many of these logistical problems. However, major questions remain about how to use summary statistics in all but the simplest downstream applications. Here, we present a novel approach to utilize basic summary statistics (estimates from single marker regressions on single phenotypes) to evaluate more complex phenotypes using multivariate methods. In particular, we present a covariate-adjusted method for conducting principal component analysis (PCA) utilizing only biobank summary statistics. We validate exact formulas for this method, as well as provide a framework of estimation when specific summary statistics are not available, through simulation. We apply our method to a real data set of fatty acid and genomic data. |
format | Online Article Text |
id | pubmed-6907735 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-69077352020-01-01 Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks Wolf, Jack M. Barnard, Martha Xia, Xueting Ryder, Nathan Westra, Jason Tintle, Nathan Pac Symp Biocomput Article The popularization of biobanks provides an unprecedented amount of genetic and phenotypic information that can be used to research the relationship between genetics and human health. Despite the opportunities these datasets provide, they also pose many problems associated with computational time and costs, data size and transfer, and privacy and security. The publishing of summary statistics from these biobanks, and the use of them in a variety of downstream statistical analyses, alleviates many of these logistical problems. However, major questions remain about how to use summary statistics in all but the simplest downstream applications. Here, we present a novel approach to utilize basic summary statistics (estimates from single marker regressions on single phenotypes) to evaluate more complex phenotypes using multivariate methods. In particular, we present a covariate-adjusted method for conducting principal component analysis (PCA) utilizing only biobank summary statistics. We validate exact formulas for this method, as well as provide a framework of estimation when specific summary statistics are not available, through simulation. We apply our method to a real data set of fatty acid and genomic data. 2020 /pmc/articles/PMC6907735/ /pubmed/31797641 Text en http://creativecommons.org/licenses/by/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License. |
spellingShingle | Article Wolf, Jack M. Barnard, Martha Xia, Xueting Ryder, Nathan Westra, Jason Tintle, Nathan Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks |
title | Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks |
title_full | Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks |
title_fullStr | Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks |
title_full_unstemmed | Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks |
title_short | Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks |
title_sort | computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907735/ https://www.ncbi.nlm.nih.gov/pubmed/31797641 |
work_keys_str_mv | AT wolfjackm computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks AT barnardmartha computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks AT xiaxueting computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks AT rydernathan computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks AT westrajason computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks AT tintlenathan computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks |