Cargando…

Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery

A common strategy for the functional interpretation of genome-wide association study (GWAS) findings has been the integrative analysis of GWAS and expression data. Using this strategy, many association methods (e.g., PrediXcan and FUSION) have been successful in identifying trait-associated genes vi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ji, Ying, Wei, Qiang, Chen, Rui, Wang, Quan, Tao, Ran, Li, Bingshan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278751/ https://www.ncbi.nlm.nih.gov/pubmed/35771864 http://dx.doi.org/10.1371/journal.pgen.1009814

_version_	1784746252218400768
author	Ji, Ying Wei, Qiang Chen, Rui Wang, Quan Tao, Ran Li, Bingshan
author_facet	Ji, Ying Wei, Qiang Chen, Rui Wang, Quan Tao, Ran Li, Bingshan
author_sort	Ji, Ying
collection	PubMed
description	A common strategy for the functional interpretation of genome-wide association study (GWAS) findings has been the integrative analysis of GWAS and expression data. Using this strategy, many association methods (e.g., PrediXcan and FUSION) have been successful in identifying trait-associated genes via mediating effects on RNA expression. However, these approaches often ignore the effects of splicing, which can carry as much disease risk as expression. Compared to expression data, one challenge to detect associations using splicing data is the large multiple testing burden due to multidimensional splicing events within genes. Here, we introduce a multidimensional splicing gene (MSG) approach, which consists of two stages: 1) we use sparse canonical correlation analysis (sCCA) to construct latent canonical vectors (CVs) by identifying sparse linear combinations of genetic variants and splicing events that are maximally correlated with each other; and 2) we test for the association between the genetically regulated splicing CVs and the trait of interest using GWAS summary statistics. Simulations show that MSG has proper type I error control and substantial power gains over existing multidimensional expression analysis methods (i.e., S-MultiXcan, UTMOST, and sCCA+ACAT) under diverse scenarios. When applied to the Genotype-Tissue Expression Project data and GWAS summary statistics of 14 complex human traits, MSG identified on average 83%, 115%, and 223% more significant genes than sCCA+ACAT, S-MultiXcan, and UTMOST, respectively. We highlight MSG’s applications to Alzheimer’s disease, low-density lipoprotein cholesterol, and schizophrenia, and found that the majority of MSG-identified genes would have been missed from expression-based analyses. Our results demonstrate that aggregating splicing data through MSG can improve power in identifying gene-trait associations and help better understand the genetic risk of complex traits.
format	Online Article Text
id	pubmed-9278751
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-92787512022-07-14 Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery Ji, Ying Wei, Qiang Chen, Rui Wang, Quan Tao, Ran Li, Bingshan PLoS Genet Research Article A common strategy for the functional interpretation of genome-wide association study (GWAS) findings has been the integrative analysis of GWAS and expression data. Using this strategy, many association methods (e.g., PrediXcan and FUSION) have been successful in identifying trait-associated genes via mediating effects on RNA expression. However, these approaches often ignore the effects of splicing, which can carry as much disease risk as expression. Compared to expression data, one challenge to detect associations using splicing data is the large multiple testing burden due to multidimensional splicing events within genes. Here, we introduce a multidimensional splicing gene (MSG) approach, which consists of two stages: 1) we use sparse canonical correlation analysis (sCCA) to construct latent canonical vectors (CVs) by identifying sparse linear combinations of genetic variants and splicing events that are maximally correlated with each other; and 2) we test for the association between the genetically regulated splicing CVs and the trait of interest using GWAS summary statistics. Simulations show that MSG has proper type I error control and substantial power gains over existing multidimensional expression analysis methods (i.e., S-MultiXcan, UTMOST, and sCCA+ACAT) under diverse scenarios. When applied to the Genotype-Tissue Expression Project data and GWAS summary statistics of 14 complex human traits, MSG identified on average 83%, 115%, and 223% more significant genes than sCCA+ACAT, S-MultiXcan, and UTMOST, respectively. We highlight MSG’s applications to Alzheimer’s disease, low-density lipoprotein cholesterol, and schizophrenia, and found that the majority of MSG-identified genes would have been missed from expression-based analyses. Our results demonstrate that aggregating splicing data through MSG can improve power in identifying gene-trait associations and help better understand the genetic risk of complex traits. Public Library of Science 2022-06-30 /pmc/articles/PMC9278751/ /pubmed/35771864 http://dx.doi.org/10.1371/journal.pgen.1009814 Text en © 2022 Ji et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Ji, Ying Wei, Qiang Chen, Rui Wang, Quan Tao, Ran Li, Bingshan Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery
title	Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery
title_full	Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery
title_fullStr	Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery
title_full_unstemmed	Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery
title_short	Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery
title_sort	integration of multidimensional splicing data and gwas summary statistics for risk gene discovery
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278751/ https://www.ncbi.nlm.nih.gov/pubmed/35771864 http://dx.doi.org/10.1371/journal.pgen.1009814
work_keys_str_mv	AT jiying integrationofmultidimensionalsplicingdataandgwassummarystatisticsforriskgenediscovery AT weiqiang integrationofmultidimensionalsplicingdataandgwassummarystatisticsforriskgenediscovery AT chenrui integrationofmultidimensionalsplicingdataandgwassummarystatisticsforriskgenediscovery AT wangquan integrationofmultidimensionalsplicingdataandgwassummarystatisticsforriskgenediscovery AT taoran integrationofmultidimensionalsplicingdataandgwassummarystatisticsforriskgenediscovery AT libingshan integrationofmultidimensionalsplicingdataandgwassummarystatisticsforriskgenediscovery

Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery

Ejemplares similares