Cargando…

Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy

BACKGROUND: Although rare missense variants in Mendelian disease genes often cluster in specific regions of proteins, it is unclear how to consider this when evaluating the pathogenicity of a gene or variant. Here we introduce methods for gene association and variant interpretation that use this pow...

Descripción completa

Detalles Bibliográficos
Autores principales: Waring, Adam, Harper, Andrew, Salatino, Silvia, Kramer, Christopher, Neubauer, Stefan, Thomson, Kate, Watkins, Hugh, Farrall, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327322/
https://www.ncbi.nlm.nih.gov/pubmed/32732227
http://dx.doi.org/10.1136/jmedgenet-2020-106922
_version_ 1783732051230326784
author Waring, Adam
Harper, Andrew
Salatino, Silvia
Kramer, Christopher
Neubauer, Stefan
Thomson, Kate
Watkins, Hugh
Farrall, Martin
author_facet Waring, Adam
Harper, Andrew
Salatino, Silvia
Kramer, Christopher
Neubauer, Stefan
Thomson, Kate
Watkins, Hugh
Farrall, Martin
author_sort Waring, Adam
collection PubMed
description BACKGROUND: Although rare missense variants in Mendelian disease genes often cluster in specific regions of proteins, it is unclear how to consider this when evaluating the pathogenicity of a gene or variant. Here we introduce methods for gene association and variant interpretation that use this powerful signal. METHODS: We present statistical methods to detect missense variant clustering (BIN-test) combined with burden information (ClusterBurden). We introduce a flexible generalised additive modelling (GAM) framework to identify mutational hotspots using burden and clustering information (hotspot model) and supplemented by in silico predictors (hotspot+ model). The methods were applied to synthetic data and a case–control dataset, comprising 5338 hypertrophic cardiomyopathy patients and 125 748 population reference samples over 34 putative cardiomyopathy genes. RESULTS: In simulations, the BIN-test was almost twice as powerful as the Anderson-Darling or Kolmogorov-Smirnov tests; ClusterBurden was computationally faster and more powerful than alternative position-informed methods. For 6/8 sarcomeric genes with strong clustering, Clusterburden showed enhanced power over burden-alone, equivalent to increasing the sample size by 50%. Hotspot+ models that combine burden, clustering and in silico predictors outperform generic pathogenicity predictors and effectively integrate ACMG criteria PM1 and PP3 to yield strong or moderate evidence of pathogenicity for 31.8% of examined variants of uncertain significance. CONCLUSION: GAMs represent a unified statistical modelling framework to combine burden, clustering and functional information. Hotspot models can refine maps of regional burden and hotspot+ models can be powerful predictors of variant pathogenicity. The BIN-test is a fast powerful approach to detect missense variant clustering that when combined with burden information (ClusterBurden) may enhance disease-gene discovery.
format Online
Article
Text
id pubmed-8327322
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-83273222021-08-19 Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy Waring, Adam Harper, Andrew Salatino, Silvia Kramer, Christopher Neubauer, Stefan Thomson, Kate Watkins, Hugh Farrall, Martin J Med Genet Methods BACKGROUND: Although rare missense variants in Mendelian disease genes often cluster in specific regions of proteins, it is unclear how to consider this when evaluating the pathogenicity of a gene or variant. Here we introduce methods for gene association and variant interpretation that use this powerful signal. METHODS: We present statistical methods to detect missense variant clustering (BIN-test) combined with burden information (ClusterBurden). We introduce a flexible generalised additive modelling (GAM) framework to identify mutational hotspots using burden and clustering information (hotspot model) and supplemented by in silico predictors (hotspot+ model). The methods were applied to synthetic data and a case–control dataset, comprising 5338 hypertrophic cardiomyopathy patients and 125 748 population reference samples over 34 putative cardiomyopathy genes. RESULTS: In simulations, the BIN-test was almost twice as powerful as the Anderson-Darling or Kolmogorov-Smirnov tests; ClusterBurden was computationally faster and more powerful than alternative position-informed methods. For 6/8 sarcomeric genes with strong clustering, Clusterburden showed enhanced power over burden-alone, equivalent to increasing the sample size by 50%. Hotspot+ models that combine burden, clustering and in silico predictors outperform generic pathogenicity predictors and effectively integrate ACMG criteria PM1 and PP3 to yield strong or moderate evidence of pathogenicity for 31.8% of examined variants of uncertain significance. CONCLUSION: GAMs represent a unified statistical modelling framework to combine burden, clustering and functional information. Hotspot models can refine maps of regional burden and hotspot+ models can be powerful predictors of variant pathogenicity. The BIN-test is a fast powerful approach to detect missense variant clustering that when combined with burden information (ClusterBurden) may enhance disease-gene discovery. BMJ Publishing Group 2021-08 2020-07-30 /pmc/articles/PMC8327322/ /pubmed/32732227 http://dx.doi.org/10.1136/jmedgenet-2020-106922 Text en © Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY. Published by BMJ. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.
spellingShingle Methods
Waring, Adam
Harper, Andrew
Salatino, Silvia
Kramer, Christopher
Neubauer, Stefan
Thomson, Kate
Watkins, Hugh
Farrall, Martin
Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title_full Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title_fullStr Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title_full_unstemmed Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title_short Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title_sort data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327322/
https://www.ncbi.nlm.nih.gov/pubmed/32732227
http://dx.doi.org/10.1136/jmedgenet-2020-106922
work_keys_str_mv AT waringadam datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT harperandrew datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT salatinosilvia datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT kramerchristopher datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT neubauerstefan datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT thomsonkate datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT watkinshugh datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT farrallmartin datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy