Cargando…

Predicting the pathogenicity of missense variants using features derived from AlphaFold2

MOTIVATION: Missense variants are a frequent class of variation within the coding genome, and some of them cause Mendelian diseases. Despite advances in computational prediction, classifying missense variants into pathogenic or benign remains a major challenge in the context of personalized medicine...

Descripción completa

Detalles Bibliográficos
Autores principales: Schmidt, Axel, Röner, Sebastian, Mai, Karola, Klinkhammer, Hannah, Kircher, Martin, Ludwig, Kerstin U
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10203375/
https://www.ncbi.nlm.nih.gov/pubmed/37084271
http://dx.doi.org/10.1093/bioinformatics/btad280
_version_ 1785045616589537280
author Schmidt, Axel
Röner, Sebastian
Mai, Karola
Klinkhammer, Hannah
Kircher, Martin
Ludwig, Kerstin U
author_facet Schmidt, Axel
Röner, Sebastian
Mai, Karola
Klinkhammer, Hannah
Kircher, Martin
Ludwig, Kerstin U
author_sort Schmidt, Axel
collection PubMed
description MOTIVATION: Missense variants are a frequent class of variation within the coding genome, and some of them cause Mendelian diseases. Despite advances in computational prediction, classifying missense variants into pathogenic or benign remains a major challenge in the context of personalized medicine. Recently, the structure of the human proteome was derived with unprecedented accuracy using the artificial intelligence system AlphaFold2. This raises the question of whether AlphaFold2 wild-type structures can improve the accuracy of computational pathogenicity prediction for missense variants. RESULTS: To address this, we first engineered a set of features for each amino acid from these structures. We then trained a random forest to distinguish between relatively common (proxy-benign) and singleton (proxy-pathogenic) missense variants from gnomAD v3.1. This yielded a novel AlphaFold2-based pathogenicity prediction score, termed AlphScore. Important feature classes used by AlphScore are solvent accessibility, amino acid network related features, features describing the physicochemical environment, and AlphaFold2’s quality parameter (predicted local distance difference test). AlphScore alone showed lower performance than existing in silico scores used for missense prediction, such as CADD or REVEL. However, when AlphScore was added to those scores, the performance increased, as measured by the approximation of deep mutational scan data, as well as the prediction of expert-curated missense variants from the ClinVar database. Overall, our data indicate that the integration of AlphaFold2-predicted structures can improve pathogenicity prediction of missense variants. AVAILABILITY AND IMPLEMENTATION: AlphScore, combinations of AlphScore with existing scores, as well as variants used for training and testing are publicly available.
format Online
Article
Text
id pubmed-10203375
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-102033752023-05-24 Predicting the pathogenicity of missense variants using features derived from AlphaFold2 Schmidt, Axel Röner, Sebastian Mai, Karola Klinkhammer, Hannah Kircher, Martin Ludwig, Kerstin U Bioinformatics Original Paper MOTIVATION: Missense variants are a frequent class of variation within the coding genome, and some of them cause Mendelian diseases. Despite advances in computational prediction, classifying missense variants into pathogenic or benign remains a major challenge in the context of personalized medicine. Recently, the structure of the human proteome was derived with unprecedented accuracy using the artificial intelligence system AlphaFold2. This raises the question of whether AlphaFold2 wild-type structures can improve the accuracy of computational pathogenicity prediction for missense variants. RESULTS: To address this, we first engineered a set of features for each amino acid from these structures. We then trained a random forest to distinguish between relatively common (proxy-benign) and singleton (proxy-pathogenic) missense variants from gnomAD v3.1. This yielded a novel AlphaFold2-based pathogenicity prediction score, termed AlphScore. Important feature classes used by AlphScore are solvent accessibility, amino acid network related features, features describing the physicochemical environment, and AlphaFold2’s quality parameter (predicted local distance difference test). AlphScore alone showed lower performance than existing in silico scores used for missense prediction, such as CADD or REVEL. However, when AlphScore was added to those scores, the performance increased, as measured by the approximation of deep mutational scan data, as well as the prediction of expert-curated missense variants from the ClinVar database. Overall, our data indicate that the integration of AlphaFold2-predicted structures can improve pathogenicity prediction of missense variants. AVAILABILITY AND IMPLEMENTATION: AlphScore, combinations of AlphScore with existing scores, as well as variants used for training and testing are publicly available. Oxford University Press 2023-04-21 /pmc/articles/PMC10203375/ /pubmed/37084271 http://dx.doi.org/10.1093/bioinformatics/btad280 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Schmidt, Axel
Röner, Sebastian
Mai, Karola
Klinkhammer, Hannah
Kircher, Martin
Ludwig, Kerstin U
Predicting the pathogenicity of missense variants using features derived from AlphaFold2
title Predicting the pathogenicity of missense variants using features derived from AlphaFold2
title_full Predicting the pathogenicity of missense variants using features derived from AlphaFold2
title_fullStr Predicting the pathogenicity of missense variants using features derived from AlphaFold2
title_full_unstemmed Predicting the pathogenicity of missense variants using features derived from AlphaFold2
title_short Predicting the pathogenicity of missense variants using features derived from AlphaFold2
title_sort predicting the pathogenicity of missense variants using features derived from alphafold2
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10203375/
https://www.ncbi.nlm.nih.gov/pubmed/37084271
http://dx.doi.org/10.1093/bioinformatics/btad280
work_keys_str_mv AT schmidtaxel predictingthepathogenicityofmissensevariantsusingfeaturesderivedfromalphafold2
AT ronersebastian predictingthepathogenicityofmissensevariantsusingfeaturesderivedfromalphafold2
AT maikarola predictingthepathogenicityofmissensevariantsusingfeaturesderivedfromalphafold2
AT klinkhammerhannah predictingthepathogenicityofmissensevariantsusingfeaturesderivedfromalphafold2
AT kirchermartin predictingthepathogenicityofmissensevariantsusingfeaturesderivedfromalphafold2
AT ludwigkerstinu predictingthepathogenicityofmissensevariantsusingfeaturesderivedfromalphafold2