Cargando…

DOT: Gene-set analysis by combining decorrelated association statistics

Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including cost of management, difficulties in cons...

Descripción completa

Detalles Bibliográficos
Autores principales: Vsevolozhskaya, Olga A., Shi, Min, Hu, Fengjiao, Zaykin, Dmitri V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7182280/
https://www.ncbi.nlm.nih.gov/pubmed/32287273
http://dx.doi.org/10.1371/journal.pcbi.1007819
_version_ 1783526215379845120
author Vsevolozhskaya, Olga A.
Shi, Min
Hu, Fengjiao
Zaykin, Dmitri V.
author_facet Vsevolozhskaya, Olga A.
Shi, Min
Hu, Fengjiao
Zaykin, Dmitri V.
author_sort Vsevolozhskaya, Olga A.
collection PubMed
description Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including cost of management, difficulties in consolidation of records across research groups, etc. These issues make methods based on SNP-level summary statistics particularly appealing. The most common form of combining statistics is a sum of SNP-level squared scores, possibly weighted, as in burden tests for rare variants. The overall significance of the resulting statistic is evaluated using its distribution under the null hypothesis. Here, we demonstrate that this basic approach can be substantially improved by decorrelating scores prior to their addition, resulting in remarkable power gains in situations that are most commonly encountered in practice; namely, under heterogeneity of effect sizes and diversity between pairwise LD. In these situations, the power of the traditional test, based on the added squared scores, quickly reaches a ceiling, as the number of variants increases. Thus, the traditional approach does not benefit from information potentially contained in any additional SNPs, while our decorrelation by orthogonal transformation (DOT) method yields steady gain in power. We present theoretical and computational analyses of both approaches, and reveal causes behind sometimes dramatic difference in their respective powers. We showcase DOT by analyzing breast cancer and cleft lip data, in which our method strengthened levels of previously reported associations and implied the possibility of multiple new alleles that jointly confer disease risk.
format Online
Article
Text
id pubmed-7182280
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71822802020-05-05 DOT: Gene-set analysis by combining decorrelated association statistics Vsevolozhskaya, Olga A. Shi, Min Hu, Fengjiao Zaykin, Dmitri V. PLoS Comput Biol Research Article Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including cost of management, difficulties in consolidation of records across research groups, etc. These issues make methods based on SNP-level summary statistics particularly appealing. The most common form of combining statistics is a sum of SNP-level squared scores, possibly weighted, as in burden tests for rare variants. The overall significance of the resulting statistic is evaluated using its distribution under the null hypothesis. Here, we demonstrate that this basic approach can be substantially improved by decorrelating scores prior to their addition, resulting in remarkable power gains in situations that are most commonly encountered in practice; namely, under heterogeneity of effect sizes and diversity between pairwise LD. In these situations, the power of the traditional test, based on the added squared scores, quickly reaches a ceiling, as the number of variants increases. Thus, the traditional approach does not benefit from information potentially contained in any additional SNPs, while our decorrelation by orthogonal transformation (DOT) method yields steady gain in power. We present theoretical and computational analyses of both approaches, and reveal causes behind sometimes dramatic difference in their respective powers. We showcase DOT by analyzing breast cancer and cleft lip data, in which our method strengthened levels of previously reported associations and implied the possibility of multiple new alleles that jointly confer disease risk. Public Library of Science 2020-04-14 /pmc/articles/PMC7182280/ /pubmed/32287273 http://dx.doi.org/10.1371/journal.pcbi.1007819 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Vsevolozhskaya, Olga A.
Shi, Min
Hu, Fengjiao
Zaykin, Dmitri V.
DOT: Gene-set analysis by combining decorrelated association statistics
title DOT: Gene-set analysis by combining decorrelated association statistics
title_full DOT: Gene-set analysis by combining decorrelated association statistics
title_fullStr DOT: Gene-set analysis by combining decorrelated association statistics
title_full_unstemmed DOT: Gene-set analysis by combining decorrelated association statistics
title_short DOT: Gene-set analysis by combining decorrelated association statistics
title_sort dot: gene-set analysis by combining decorrelated association statistics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7182280/
https://www.ncbi.nlm.nih.gov/pubmed/32287273
http://dx.doi.org/10.1371/journal.pcbi.1007819
work_keys_str_mv AT vsevolozhskayaolgaa dotgenesetanalysisbycombiningdecorrelatedassociationstatistics
AT shimin dotgenesetanalysisbycombiningdecorrelatedassociationstatistics
AT hufengjiao dotgenesetanalysisbycombiningdecorrelatedassociationstatistics
AT zaykindmitriv dotgenesetanalysisbycombiningdecorrelatedassociationstatistics