Cargando…

Calculating the statistical significance of rare variants causal for Mendelian and complex disorders

BACKGROUND: With the expanding use of next-gen sequencing (NGS) to diagnose the thousands of rare Mendelian genetic diseases, it is critical to be able to interpret individual DNA variation. To calculate the significance of finding a rare protein-altering variant in a given gene, one must know the f...

Descripción completa

Detalles Bibliográficos
Autores principales: Rao, Aliz R., Nelson, Stanley F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6001062/
https://www.ncbi.nlm.nih.gov/pubmed/29898714
http://dx.doi.org/10.1186/s12920-018-0371-9
_version_ 1783331903827345408
author Rao, Aliz R.
Nelson, Stanley F.
author_facet Rao, Aliz R.
Nelson, Stanley F.
author_sort Rao, Aliz R.
collection PubMed
description BACKGROUND: With the expanding use of next-gen sequencing (NGS) to diagnose the thousands of rare Mendelian genetic diseases, it is critical to be able to interpret individual DNA variation. To calculate the significance of finding a rare protein-altering variant in a given gene, one must know the frequency of seeing a variant in the general population that is at least as damaging as the variant in question. METHODS: We developed a general method to better interpret the likelihood that a rare variant is disease causing if observed in a given gene or genic region mapping to a described protein domain, using genome-wide information from a large control sample. Based on data from 2504 individuals in the 1000 Genomes Project dataset, we calculated the number of individuals who have a rare variant in a given gene for numerous filtering threshold scenarios, which may be used for calculating the significance of an observed rare variant being causal for disease. Additionally, we calculated mutational burden data on the number of individuals with rare variants in genic regions mapping to protein domains. RESULTS: We describe methods to use the mutational burden data for calculating the significance of observing rare variants in a given proportion of sequenced individuals. We present SORVA, an implementation of these methods as a web tool, and we demonstrate application to 20 relevant but diverse next-gen sequencing studies. Specifically, we calculate the statistical significance of findings involving multi-family studies with rare Mendelian disease and a large-scale study of a complex disorder, autism spectrum disorder. If we use the frequency counts to rank genes based on intolerance for variation, the ranking correlates well with pLI scores derived from the Exome Aggregation Consortium (ExAC) dataset (ρ = 0.515), with the benefit that the scores are directly interpretable. CONCLUSIONS: We have presented a strategy that is useful for vetting candidate genes from NGS studies and allows researchers to calculate the significance of seeing a variant in a given gene or protein domain. This approach is an important step towards developing a quantitative, statistics-based approach for presenting clinical findings. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0371-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6001062
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60010622018-06-26 Calculating the statistical significance of rare variants causal for Mendelian and complex disorders Rao, Aliz R. Nelson, Stanley F. BMC Med Genomics Technical Advance BACKGROUND: With the expanding use of next-gen sequencing (NGS) to diagnose the thousands of rare Mendelian genetic diseases, it is critical to be able to interpret individual DNA variation. To calculate the significance of finding a rare protein-altering variant in a given gene, one must know the frequency of seeing a variant in the general population that is at least as damaging as the variant in question. METHODS: We developed a general method to better interpret the likelihood that a rare variant is disease causing if observed in a given gene or genic region mapping to a described protein domain, using genome-wide information from a large control sample. Based on data from 2504 individuals in the 1000 Genomes Project dataset, we calculated the number of individuals who have a rare variant in a given gene for numerous filtering threshold scenarios, which may be used for calculating the significance of an observed rare variant being causal for disease. Additionally, we calculated mutational burden data on the number of individuals with rare variants in genic regions mapping to protein domains. RESULTS: We describe methods to use the mutational burden data for calculating the significance of observing rare variants in a given proportion of sequenced individuals. We present SORVA, an implementation of these methods as a web tool, and we demonstrate application to 20 relevant but diverse next-gen sequencing studies. Specifically, we calculate the statistical significance of findings involving multi-family studies with rare Mendelian disease and a large-scale study of a complex disorder, autism spectrum disorder. If we use the frequency counts to rank genes based on intolerance for variation, the ranking correlates well with pLI scores derived from the Exome Aggregation Consortium (ExAC) dataset (ρ = 0.515), with the benefit that the scores are directly interpretable. CONCLUSIONS: We have presented a strategy that is useful for vetting candidate genes from NGS studies and allows researchers to calculate the significance of seeing a variant in a given gene or protein domain. This approach is an important step towards developing a quantitative, statistics-based approach for presenting clinical findings. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0371-9) contains supplementary material, which is available to authorized users. BioMed Central 2018-06-13 /pmc/articles/PMC6001062/ /pubmed/29898714 http://dx.doi.org/10.1186/s12920-018-0371-9 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Advance
Rao, Aliz R.
Nelson, Stanley F.
Calculating the statistical significance of rare variants causal for Mendelian and complex disorders
title Calculating the statistical significance of rare variants causal for Mendelian and complex disorders
title_full Calculating the statistical significance of rare variants causal for Mendelian and complex disorders
title_fullStr Calculating the statistical significance of rare variants causal for Mendelian and complex disorders
title_full_unstemmed Calculating the statistical significance of rare variants causal for Mendelian and complex disorders
title_short Calculating the statistical significance of rare variants causal for Mendelian and complex disorders
title_sort calculating the statistical significance of rare variants causal for mendelian and complex disorders
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6001062/
https://www.ncbi.nlm.nih.gov/pubmed/29898714
http://dx.doi.org/10.1186/s12920-018-0371-9
work_keys_str_mv AT raoalizr calculatingthestatisticalsignificanceofrarevariantscausalformendelianandcomplexdisorders
AT nelsonstanleyf calculatingthestatisticalsignificanceofrarevariantscausalformendelianandcomplexdisorders