Cargando…

Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics

Knockoff-based methods have become increasingly popular due to their enhanced power for locus discovery and their ability to prioritize putative causal variants in a genome-wide analysis. However, because of the substantial computational cost for generating knockoffs, existing knockoff approaches ca...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Zihuai, Le Guen, Yann, Liu, Linxi, Lee, Justin, Ma, Shiyang, Yang, Andrew C., Liu, Xiaoxia, Rutledge, Jarod, Losada, Patricia Moran, Song, Bowen, Belloy, Michael E., Butler, Robert R., Longo, Frank M., Tang, Hua, Mormino, Elizabeth C., Wyss-Coray, Tony, Greicius, Michael D., Ionita-Laza, Iuliana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8715147/
https://www.ncbi.nlm.nih.gov/pubmed/34767756
http://dx.doi.org/10.1016/j.ajhg.2021.10.009
_version_ 1784624076315164672
author He, Zihuai
Le Guen, Yann
Liu, Linxi
Lee, Justin
Ma, Shiyang
Yang, Andrew C.
Liu, Xiaoxia
Rutledge, Jarod
Losada, Patricia Moran
Song, Bowen
Belloy, Michael E.
Butler, Robert R.
Longo, Frank M.
Tang, Hua
Mormino, Elizabeth C.
Wyss-Coray, Tony
Greicius, Michael D.
Ionita-Laza, Iuliana
author_facet He, Zihuai
Le Guen, Yann
Liu, Linxi
Lee, Justin
Ma, Shiyang
Yang, Andrew C.
Liu, Xiaoxia
Rutledge, Jarod
Losada, Patricia Moran
Song, Bowen
Belloy, Michael E.
Butler, Robert R.
Longo, Frank M.
Tang, Hua
Mormino, Elizabeth C.
Wyss-Coray, Tony
Greicius, Michael D.
Ionita-Laza, Iuliana
author_sort He, Zihuai
collection PubMed
description Knockoff-based methods have become increasingly popular due to their enhanced power for locus discovery and their ability to prioritize putative causal variants in a genome-wide analysis. However, because of the substantial computational cost for generating knockoffs, existing knockoff approaches cannot analyze millions of rare genetic variants in biobank-scale whole-genome sequencing and whole-genome imputed datasets. We propose a scalable knockoff-based method for the analysis of common and rare variants across the genome, KnockoffScreen-AL, that is applicable to biobank-scale studies with hundreds of thousands of samples and millions of genetic variants. The application of KnockoffScreen-AL to the analysis of Alzheimer disease (AD) in 388,051 WG-imputed samples from the UK Biobank resulted in 31 significant loci, including 14 loci that are missed by conventional association tests on these data. We perform replication studies in an independent meta-analysis of clinically diagnosed AD with 94,437 samples, and additionally leverage single-cell RNA-sequencing data with 143,793 single-nucleus transcriptomes from 17 control subjects and AD-affected individuals, and proteomics data from 735 control subjects and affected indviduals with AD and related disorders to validate the genes at these significant loci. These multi-omics analyses show that 79.1% of the proximal genes at these loci and 76.2% of the genes at loci identified only by KnockoffScreen-AL exhibit at least suggestive signal (p < 0.05) in the scRNA-seq or proteomics analyses. We highlight a potentially causal gene in AD progression, EGFR, that shows significant differences in expression and protein levels between AD-affected individuals and healthy control subjects.
format Online
Article
Text
id pubmed-8715147
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-87151472022-01-12 Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics He, Zihuai Le Guen, Yann Liu, Linxi Lee, Justin Ma, Shiyang Yang, Andrew C. Liu, Xiaoxia Rutledge, Jarod Losada, Patricia Moran Song, Bowen Belloy, Michael E. Butler, Robert R. Longo, Frank M. Tang, Hua Mormino, Elizabeth C. Wyss-Coray, Tony Greicius, Michael D. Ionita-Laza, Iuliana Am J Hum Genet Article Knockoff-based methods have become increasingly popular due to their enhanced power for locus discovery and their ability to prioritize putative causal variants in a genome-wide analysis. However, because of the substantial computational cost for generating knockoffs, existing knockoff approaches cannot analyze millions of rare genetic variants in biobank-scale whole-genome sequencing and whole-genome imputed datasets. We propose a scalable knockoff-based method for the analysis of common and rare variants across the genome, KnockoffScreen-AL, that is applicable to biobank-scale studies with hundreds of thousands of samples and millions of genetic variants. The application of KnockoffScreen-AL to the analysis of Alzheimer disease (AD) in 388,051 WG-imputed samples from the UK Biobank resulted in 31 significant loci, including 14 loci that are missed by conventional association tests on these data. We perform replication studies in an independent meta-analysis of clinically diagnosed AD with 94,437 samples, and additionally leverage single-cell RNA-sequencing data with 143,793 single-nucleus transcriptomes from 17 control subjects and AD-affected individuals, and proteomics data from 735 control subjects and affected indviduals with AD and related disorders to validate the genes at these significant loci. These multi-omics analyses show that 79.1% of the proximal genes at these loci and 76.2% of the genes at loci identified only by KnockoffScreen-AL exhibit at least suggestive signal (p < 0.05) in the scRNA-seq or proteomics analyses. We highlight a potentially causal gene in AD progression, EGFR, that shows significant differences in expression and protein levels between AD-affected individuals and healthy control subjects. Elsevier 2021-12-02 2021-11-11 /pmc/articles/PMC8715147/ /pubmed/34767756 http://dx.doi.org/10.1016/j.ajhg.2021.10.009 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
He, Zihuai
Le Guen, Yann
Liu, Linxi
Lee, Justin
Ma, Shiyang
Yang, Andrew C.
Liu, Xiaoxia
Rutledge, Jarod
Losada, Patricia Moran
Song, Bowen
Belloy, Michael E.
Butler, Robert R.
Longo, Frank M.
Tang, Hua
Mormino, Elizabeth C.
Wyss-Coray, Tony
Greicius, Michael D.
Ionita-Laza, Iuliana
Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics
title Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics
title_full Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics
title_fullStr Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics
title_full_unstemmed Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics
title_short Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics
title_sort genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to alzheimer disease genetics
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8715147/
https://www.ncbi.nlm.nih.gov/pubmed/34767756
http://dx.doi.org/10.1016/j.ajhg.2021.10.009
work_keys_str_mv AT hezihuai genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT leguenyann genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT liulinxi genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT leejustin genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT mashiyang genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT yangandrewc genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT liuxiaoxia genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT rutledgejarod genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT losadapatriciamoran genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT songbowen genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT belloymichaele genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT butlerrobertr genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT longofrankm genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT tanghua genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT morminoelizabethc genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT wysscoraytony genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT greiciusmichaeld genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics
AT ionitalazaiuliana genomewideanalysisofcommonandrarevariantsviamultipleknockoffsatbiobankscalewithanapplicationtoalzheimerdiseasegenetics