Cargando…

MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction

[Image: see text] Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. Fir...

Descripción completa

Detalles Bibliográficos
Autores principales: Ge, Fang, Arif, Muhammad, Yan, Zihao, Alahmadi, Hanin, Worachartcheewan, Apilak, Yu, Dong-Jun, Shoombuatong, Watshara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10685454/
https://www.ncbi.nlm.nih.gov/pubmed/37947586
http://dx.doi.org/10.1021/acs.jcim.3c00950
_version_ 1785151634533253120
author Ge, Fang
Arif, Muhammad
Yan, Zihao
Alahmadi, Hanin
Worachartcheewan, Apilak
Yu, Dong-Jun
Shoombuatong, Watshara
author_facet Ge, Fang
Arif, Muhammad
Yan, Zihao
Alahmadi, Hanin
Worachartcheewan, Apilak
Yu, Dong-Jun
Shoombuatong, Watshara
author_sort Ge, Fang
collection PubMed
description [Image: see text] Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals’ outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites’ ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals’ outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity. Our Web server, available at http://csbio.njust.edu.cn/bioinf/mmpatho/, allows researchers to predict the pathogenicity (alongside the reliability index score) of MMs using the ConsMM and EvoIndMM models and provides extensive annotations for user input. Additionally, the newly constructed benchmark data set and blind test set can be accessed via the data page of our web server.
format Online
Article
Text
id pubmed-10685454
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-106854542023-11-30 MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction Ge, Fang Arif, Muhammad Yan, Zihao Alahmadi, Hanin Worachartcheewan, Apilak Yu, Dong-Jun Shoombuatong, Watshara J Chem Inf Model [Image: see text] Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals’ outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites’ ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals’ outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity. Our Web server, available at http://csbio.njust.edu.cn/bioinf/mmpatho/, allows researchers to predict the pathogenicity (alongside the reliability index score) of MMs using the ConsMM and EvoIndMM models and provides extensive annotations for user input. Additionally, the newly constructed benchmark data set and blind test set can be accessed via the data page of our web server. American Chemical Society 2023-11-10 /pmc/articles/PMC10685454/ /pubmed/37947586 http://dx.doi.org/10.1021/acs.jcim.3c00950 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Ge, Fang
Arif, Muhammad
Yan, Zihao
Alahmadi, Hanin
Worachartcheewan, Apilak
Yu, Dong-Jun
Shoombuatong, Watshara
MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
title MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
title_full MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
title_fullStr MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
title_full_unstemmed MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
title_short MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
title_sort mmpatho: leveraging multilevel consensus and evolutionary information for enhanced missense mutation pathogenic prediction
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10685454/
https://www.ncbi.nlm.nih.gov/pubmed/37947586
http://dx.doi.org/10.1021/acs.jcim.3c00950
work_keys_str_mv AT gefang mmpatholeveragingmultilevelconsensusandevolutionaryinformationforenhancedmissensemutationpathogenicprediction
AT arifmuhammad mmpatholeveragingmultilevelconsensusandevolutionaryinformationforenhancedmissensemutationpathogenicprediction
AT yanzihao mmpatholeveragingmultilevelconsensusandevolutionaryinformationforenhancedmissensemutationpathogenicprediction
AT alahmadihanin mmpatholeveragingmultilevelconsensusandevolutionaryinformationforenhancedmissensemutationpathogenicprediction
AT worachartcheewanapilak mmpatholeveragingmultilevelconsensusandevolutionaryinformationforenhancedmissensemutationpathogenicprediction
AT yudongjun mmpatholeveragingmultilevelconsensusandevolutionaryinformationforenhancedmissensemutationpathogenicprediction
AT shoombuatongwatshara mmpatholeveragingmultilevelconsensusandevolutionaryinformationforenhancedmissensemutationpathogenicprediction