Cargando…

Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity

We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied to 840 genes from the ClinVar database, this approach detected a significant non-random distribution of pathogenic and benign variants in 387 (4...

Descripción completa

Detalles Bibliográficos
Autores principales: Quinodoz, Mathieu, Peter, Virginie G., Cisarova, Katarina, Royer-Bertrand, Beryl, Stenson, Peter D., Cooper, David N., Unger, Sheila, Superti-Furga, Andrea, Rivolta, Carlo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8948164/
https://www.ncbi.nlm.nih.gov/pubmed/35120630
http://dx.doi.org/10.1016/j.ajhg.2022.01.006
_version_ 1784674605023100928
author Quinodoz, Mathieu
Peter, Virginie G.
Cisarova, Katarina
Royer-Bertrand, Beryl
Stenson, Peter D.
Cooper, David N.
Unger, Sheila
Superti-Furga, Andrea
Rivolta, Carlo
author_facet Quinodoz, Mathieu
Peter, Virginie G.
Cisarova, Katarina
Royer-Bertrand, Beryl
Stenson, Peter D.
Cooper, David N.
Unger, Sheila
Superti-Furga, Andrea
Rivolta, Carlo
author_sort Quinodoz, Mathieu
collection PubMed
description We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied to 840 genes from the ClinVar database, this approach detected a significant non-random distribution of pathogenic and benign variants in 387 (46%) and 172 (20%) genes, respectively, revealing that variant clustering is widespread across the human exome. This clustering likely occurs as a consequence of mechanisms shaping pathogenicity at the protein level, as illustrated by the overlap of some clusters with known functional domains. We then took advantage of these findings to develop a pathogenicity predictor, MutScore, that integrates qualitative features of DNA substitutions with the new additional information derived from this positional clustering. Using a random forest approach, MutScore was able to identify pathogenic missense mutations with very high accuracy, outperforming existing predictive tools, especially for variants associated with autosomal-dominant disease and cancer. Thus, the within-gene clustering of pathogenic and benign DNA changes is an important and previously underappreciated feature of the human exome, which can be harnessed to improve the prediction of pathogenicity and disambiguation of DNA variants of uncertain significance.
format Online
Article
Text
id pubmed-8948164
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-89481642022-03-26 Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity Quinodoz, Mathieu Peter, Virginie G. Cisarova, Katarina Royer-Bertrand, Beryl Stenson, Peter D. Cooper, David N. Unger, Sheila Superti-Furga, Andrea Rivolta, Carlo Am J Hum Genet Article We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied to 840 genes from the ClinVar database, this approach detected a significant non-random distribution of pathogenic and benign variants in 387 (46%) and 172 (20%) genes, respectively, revealing that variant clustering is widespread across the human exome. This clustering likely occurs as a consequence of mechanisms shaping pathogenicity at the protein level, as illustrated by the overlap of some clusters with known functional domains. We then took advantage of these findings to develop a pathogenicity predictor, MutScore, that integrates qualitative features of DNA substitutions with the new additional information derived from this positional clustering. Using a random forest approach, MutScore was able to identify pathogenic missense mutations with very high accuracy, outperforming existing predictive tools, especially for variants associated with autosomal-dominant disease and cancer. Thus, the within-gene clustering of pathogenic and benign DNA changes is an important and previously underappreciated feature of the human exome, which can be harnessed to improve the prediction of pathogenicity and disambiguation of DNA variants of uncertain significance. Elsevier 2022-03-03 2022-02-03 /pmc/articles/PMC8948164/ /pubmed/35120630 http://dx.doi.org/10.1016/j.ajhg.2022.01.006 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Quinodoz, Mathieu
Peter, Virginie G.
Cisarova, Katarina
Royer-Bertrand, Beryl
Stenson, Peter D.
Cooper, David N.
Unger, Sheila
Superti-Furga, Andrea
Rivolta, Carlo
Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity
title Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity
title_full Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity
title_fullStr Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity
title_full_unstemmed Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity
title_short Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity
title_sort analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8948164/
https://www.ncbi.nlm.nih.gov/pubmed/35120630
http://dx.doi.org/10.1016/j.ajhg.2022.01.006
work_keys_str_mv AT quinodozmathieu analysisofmissensevariantsinthehumangenomerevealswidespreadgenespecificclusteringandimprovespredictionofpathogenicity
AT petervirginieg analysisofmissensevariantsinthehumangenomerevealswidespreadgenespecificclusteringandimprovespredictionofpathogenicity
AT cisarovakatarina analysisofmissensevariantsinthehumangenomerevealswidespreadgenespecificclusteringandimprovespredictionofpathogenicity
AT royerbertrandberyl analysisofmissensevariantsinthehumangenomerevealswidespreadgenespecificclusteringandimprovespredictionofpathogenicity
AT stensonpeterd analysisofmissensevariantsinthehumangenomerevealswidespreadgenespecificclusteringandimprovespredictionofpathogenicity
AT cooperdavidn analysisofmissensevariantsinthehumangenomerevealswidespreadgenespecificclusteringandimprovespredictionofpathogenicity
AT ungersheila analysisofmissensevariantsinthehumangenomerevealswidespreadgenespecificclusteringandimprovespredictionofpathogenicity
AT supertifurgaandrea analysisofmissensevariantsinthehumangenomerevealswidespreadgenespecificclusteringandimprovespredictionofpathogenicity
AT rivoltacarlo analysisofmissensevariantsinthehumangenomerevealswidespreadgenespecificclusteringandimprovespredictionofpathogenicity