Cargando…

Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS

BACKGROUND: In the sporadic form of amyotrophic lateral sclerosis (ALS), the pathogenicity of rare variants in the causative genes characterizing the familial form remains largely unknown. To predict the pathogenicity of such variants, in silico analysis is commonly used. In some ALS causative genes...

Descripción completa

Detalles Bibliográficos
Autores principales: Hatano, Yuya, Ishihara, Tomohiko, Onodera, Osamu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197232/
https://www.ncbi.nlm.nih.gov/pubmed/37208601
http://dx.doi.org/10.1186/s12859-023-05338-5
_version_ 1785044507031502848
author Hatano, Yuya
Ishihara, Tomohiko
Onodera, Osamu
author_facet Hatano, Yuya
Ishihara, Tomohiko
Onodera, Osamu
author_sort Hatano, Yuya
collection PubMed
description BACKGROUND: In the sporadic form of amyotrophic lateral sclerosis (ALS), the pathogenicity of rare variants in the causative genes characterizing the familial form remains largely unknown. To predict the pathogenicity of such variants, in silico analysis is commonly used. In some ALS causative genes, the pathogenic variants are concentrated in specific regions, and the resulting alterations in protein structure are thought to significantly affect pathogenicity. However, existing methods have not taken this issue into account. To address this, we have developed a technique termed MOVA (method for evaluating the pathogenicity of missense variants using AlphaFold2), which applies positional information for structural variants predicted by AlphaFold2. Here we examined the utility of MOVA for analysis of several causative genes of ALS. METHODS: We analyzed variants of 12 ALS-related genes (TARDBP, FUS, SETX, TBK1, OPTN, SOD1, VCP, SQSTM1, ANG, UBQLN2, DCTN1, and CCNF) and classified them as pathogenic or neutral. For each gene, the features of the variants, consisting of their positions in the 3D structure predicted by AlphaFold2, pLDDT score, and BLOSUM62 were trained into a random forest and evaluated by the stratified fivefold cross validation method. We compared how accurately MOVA predicted mutant pathogenicity with other in silico prediction methods and evaluated the prediction accuracy at TARDBP and FUS hotspots. We also examined which of the MOVA features had the greatest impact on pathogenicity discrimination. RESULTS: MOVA yielded useful results (AUC ≥ 0.70) for TARDBP, FUS, SOD1, VCP, and UBQLN2 of 12 ALS causative genes. In addition, when comparing the prediction accuracy with other in silico prediction methods, MOVA obtained the best results among those compared for TARDBP, VCP, UBQLN2, and CCNF. MOVA demonstrated superior predictive accuracy for the pathogenicity of mutations at hotspots of TARDBP and FUS. Moreover, higher accuracy was achieved by combining MOVA with REVEL or CADD. Among the features of MOVA, the x, y, and z coordinates performed the best and were highly correlated with MOVA. CONCLUSIONS: MOVA is useful for predicting the virulence of rare variants in which they are concentrated at specific structural sites, and for use in combination with other prediction methods. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05338-5.
format Online
Article
Text
id pubmed-10197232
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-101972322023-05-20 Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS Hatano, Yuya Ishihara, Tomohiko Onodera, Osamu BMC Bioinformatics Research Article BACKGROUND: In the sporadic form of amyotrophic lateral sclerosis (ALS), the pathogenicity of rare variants in the causative genes characterizing the familial form remains largely unknown. To predict the pathogenicity of such variants, in silico analysis is commonly used. In some ALS causative genes, the pathogenic variants are concentrated in specific regions, and the resulting alterations in protein structure are thought to significantly affect pathogenicity. However, existing methods have not taken this issue into account. To address this, we have developed a technique termed MOVA (method for evaluating the pathogenicity of missense variants using AlphaFold2), which applies positional information for structural variants predicted by AlphaFold2. Here we examined the utility of MOVA for analysis of several causative genes of ALS. METHODS: We analyzed variants of 12 ALS-related genes (TARDBP, FUS, SETX, TBK1, OPTN, SOD1, VCP, SQSTM1, ANG, UBQLN2, DCTN1, and CCNF) and classified them as pathogenic or neutral. For each gene, the features of the variants, consisting of their positions in the 3D structure predicted by AlphaFold2, pLDDT score, and BLOSUM62 were trained into a random forest and evaluated by the stratified fivefold cross validation method. We compared how accurately MOVA predicted mutant pathogenicity with other in silico prediction methods and evaluated the prediction accuracy at TARDBP and FUS hotspots. We also examined which of the MOVA features had the greatest impact on pathogenicity discrimination. RESULTS: MOVA yielded useful results (AUC ≥ 0.70) for TARDBP, FUS, SOD1, VCP, and UBQLN2 of 12 ALS causative genes. In addition, when comparing the prediction accuracy with other in silico prediction methods, MOVA obtained the best results among those compared for TARDBP, VCP, UBQLN2, and CCNF. MOVA demonstrated superior predictive accuracy for the pathogenicity of mutations at hotspots of TARDBP and FUS. Moreover, higher accuracy was achieved by combining MOVA with REVEL or CADD. Among the features of MOVA, the x, y, and z coordinates performed the best and were highly correlated with MOVA. CONCLUSIONS: MOVA is useful for predicting the virulence of rare variants in which they are concentrated at specific structural sites, and for use in combination with other prediction methods. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05338-5. BioMed Central 2023-05-19 /pmc/articles/PMC10197232/ /pubmed/37208601 http://dx.doi.org/10.1186/s12859-023-05338-5 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Hatano, Yuya
Ishihara, Tomohiko
Onodera, Osamu
Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS
title Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS
title_full Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS
title_fullStr Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS
title_full_unstemmed Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS
title_short Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS
title_sort accuracy of a machine learning method based on structural and locational information from alphafold2 for predicting the pathogenicity of tardbp and fus gene variants in als
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197232/
https://www.ncbi.nlm.nih.gov/pubmed/37208601
http://dx.doi.org/10.1186/s12859-023-05338-5
work_keys_str_mv AT hatanoyuya accuracyofamachinelearningmethodbasedonstructuralandlocationalinformationfromalphafold2forpredictingthepathogenicityoftardbpandfusgenevariantsinals
AT ishiharatomohiko accuracyofamachinelearningmethodbasedonstructuralandlocationalinformationfromalphafold2forpredictingthepathogenicityoftardbpandfusgenevariantsinals
AT onoderaosamu accuracyofamachinelearningmethodbasedonstructuralandlocationalinformationfromalphafold2forpredictingthepathogenicityoftardbpandfusgenevariantsinals