Cargando…

Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks

The popularization of biobanks provides an unprecedented amount of genetic and phenotypic information that can be used to research the relationship between genetics and human health. Despite the opportunities these datasets provide, they also pose many problems associated with computational time and...

Descripción completa

Detalles Bibliográficos
Autores principales: Wolf, Jack M., Barnard, Martha, Xia, Xueting, Ryder, Nathan, Westra, Jason, Tintle, Nathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907735/
https://www.ncbi.nlm.nih.gov/pubmed/31797641
_version_ 1783478586017054720
author Wolf, Jack M.
Barnard, Martha
Xia, Xueting
Ryder, Nathan
Westra, Jason
Tintle, Nathan
author_facet Wolf, Jack M.
Barnard, Martha
Xia, Xueting
Ryder, Nathan
Westra, Jason
Tintle, Nathan
author_sort Wolf, Jack M.
collection PubMed
description The popularization of biobanks provides an unprecedented amount of genetic and phenotypic information that can be used to research the relationship between genetics and human health. Despite the opportunities these datasets provide, they also pose many problems associated with computational time and costs, data size and transfer, and privacy and security. The publishing of summary statistics from these biobanks, and the use of them in a variety of downstream statistical analyses, alleviates many of these logistical problems. However, major questions remain about how to use summary statistics in all but the simplest downstream applications. Here, we present a novel approach to utilize basic summary statistics (estimates from single marker regressions on single phenotypes) to evaluate more complex phenotypes using multivariate methods. In particular, we present a covariate-adjusted method for conducting principal component analysis (PCA) utilizing only biobank summary statistics. We validate exact formulas for this method, as well as provide a framework of estimation when specific summary statistics are not available, through simulation. We apply our method to a real data set of fatty acid and genomic data.
format Online
Article
Text
id pubmed-6907735
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-69077352020-01-01 Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks Wolf, Jack M. Barnard, Martha Xia, Xueting Ryder, Nathan Westra, Jason Tintle, Nathan Pac Symp Biocomput Article The popularization of biobanks provides an unprecedented amount of genetic and phenotypic information that can be used to research the relationship between genetics and human health. Despite the opportunities these datasets provide, they also pose many problems associated with computational time and costs, data size and transfer, and privacy and security. The publishing of summary statistics from these biobanks, and the use of them in a variety of downstream statistical analyses, alleviates many of these logistical problems. However, major questions remain about how to use summary statistics in all but the simplest downstream applications. Here, we present a novel approach to utilize basic summary statistics (estimates from single marker regressions on single phenotypes) to evaluate more complex phenotypes using multivariate methods. In particular, we present a covariate-adjusted method for conducting principal component analysis (PCA) utilizing only biobank summary statistics. We validate exact formulas for this method, as well as provide a framework of estimation when specific summary statistics are not available, through simulation. We apply our method to a real data set of fatty acid and genomic data. 2020 /pmc/articles/PMC6907735/ /pubmed/31797641 Text en http://creativecommons.org/licenses/by/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.
spellingShingle Article
Wolf, Jack M.
Barnard, Martha
Xia, Xueting
Ryder, Nathan
Westra, Jason
Tintle, Nathan
Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks
title Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks
title_full Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks
title_fullStr Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks
title_full_unstemmed Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks
title_short Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks
title_sort computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907735/
https://www.ncbi.nlm.nih.gov/pubmed/31797641
work_keys_str_mv AT wolfjackm computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks
AT barnardmartha computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks
AT xiaxueting computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks
AT rydernathan computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks
AT westrajason computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks
AT tintlenathan computationallyefficientexactcovariateadjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks