Cargando…

The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most importan...

Descripción completa

Detalles Bibliográficos
Autores principales: Petrovski, Slavé, Gussow, Ayal B., Wang, Quanli, Halvorsen, Matt, Han, Yujun, Weir, William H., Allen, Andrew S., Goldstein, David B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4557908/
https://www.ncbi.nlm.nih.gov/pubmed/26332131
http://dx.doi.org/10.1371/journal.pgen.1005492
_version_ 1782388536292409344
author Petrovski, Slavé
Gussow, Ayal B.
Wang, Quanli
Halvorsen, Matt
Han, Yujun
Weir, William H.
Allen, Andrew S.
Goldstein, David B.
author_facet Petrovski, Slavé
Gussow, Ayal B.
Wang, Quanli
Halvorsen, Matt
Han, Yujun
Weir, William H.
Allen, Andrew S.
Goldstein, David B.
author_sort Petrovski, Slavé
collection PubMed
description Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease.
format Online
Article
Text
id pubmed-4557908
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-45579082015-09-10 The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity Petrovski, Slavé Gussow, Ayal B. Wang, Quanli Halvorsen, Matt Han, Yujun Weir, William H. Allen, Andrew S. Goldstein, David B. PLoS Genet Research Article Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease. Public Library of Science 2015-09-02 /pmc/articles/PMC4557908/ /pubmed/26332131 http://dx.doi.org/10.1371/journal.pgen.1005492 Text en © 2015 Petrovski et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Petrovski, Slavé
Gussow, Ayal B.
Wang, Quanli
Halvorsen, Matt
Han, Yujun
Weir, William H.
Allen, Andrew S.
Goldstein, David B.
The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity
title The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity
title_full The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity
title_fullStr The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity
title_full_unstemmed The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity
title_short The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity
title_sort intolerance of regulatory sequence to genetic variation predicts gene dosage sensitivity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4557908/
https://www.ncbi.nlm.nih.gov/pubmed/26332131
http://dx.doi.org/10.1371/journal.pgen.1005492
work_keys_str_mv AT petrovskislave theintoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT gussowayalb theintoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT wangquanli theintoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT halvorsenmatt theintoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT hanyujun theintoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT weirwilliamh theintoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT allenandrews theintoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT goldsteindavidb theintoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT petrovskislave intoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT gussowayalb intoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT wangquanli intoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT halvorsenmatt intoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT hanyujun intoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT weirwilliamh intoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT allenandrews intoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity
AT goldsteindavidb intoleranceofregulatorysequencetogeneticvariationpredictsgenedosagesensitivity