Cargando…

MutTMPredictor: Robust and accurate cascade XGBoost classifier for prediction of mutations in transmembrane proteins

Transmembrane proteins have critical biological functions and play a role in a multitude of cellular processes including cell signaling, transport of molecules and ions across membranes. Approximately 60% of transmembrane proteins are considered as drug targets. Missense mutations in such proteins c...

Descripción completa

Detalles Bibliográficos
Autores principales: Ge, Fang, Zhu, Yi-Heng, Xu, Jian, Muhammad, Arif, Song, Jiangning, Yu, Dong-Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8649221/
https://www.ncbi.nlm.nih.gov/pubmed/34938415
http://dx.doi.org/10.1016/j.csbj.2021.11.024
_version_ 1784610947062562816
author Ge, Fang
Zhu, Yi-Heng
Xu, Jian
Muhammad, Arif
Song, Jiangning
Yu, Dong-Jun
author_facet Ge, Fang
Zhu, Yi-Heng
Xu, Jian
Muhammad, Arif
Song, Jiangning
Yu, Dong-Jun
author_sort Ge, Fang
collection PubMed
description Transmembrane proteins have critical biological functions and play a role in a multitude of cellular processes including cell signaling, transport of molecules and ions across membranes. Approximately 60% of transmembrane proteins are considered as drug targets. Missense mutations in such proteins can lead to many diverse diseases and disorders, such as neurodegenerative diseases and cystic fibrosis. However, there are limited studies on mutations in transmembrane proteins. In this work, we first design a new feature encoding method, termed weight attenuation position-specific scoring matrix (WAPSSM), which builds upon the protein evolutionary information. Then, we propose a new mutation prediction algorithm (cascade XGBoost) by leveraging the idea learned from consensus predictors and gcForest. Multi-level experiments illustrate the effectiveness of WAPSSM and cascade XGBoost algorithms. Finally, based on WAPSSM and other three types of features, in combination with the cascade XGBoost algorithm, we develop a new transmembrane protein mutation predictor, named MutTMPredictor. We benchmark the performance of MutTMPredictor against several existing predictors on seven datasets. On the 546 mutations dataset, MutTMPredictor achieves the accuracy (ACC) of 0.9661 and the Matthew’s Correlation Coefficient (MCC) of 0.8950. While on the 67,584 dataset, MutTMPredictor achieves an MCC of 0.7523 and area under curve (AUC) of 0.8746, which are 0.1625 and 0.0801 respectively higher than those of the existing best predictor (fathmm). Besides, MutTMPredictor also outperforms two specific predictors on the Pred-MutHTP datasets. The results suggest that MutTMPredictor can be used as an effective method for predicting and prioritizing missense mutations in transmembrane proteins. The MutTMPredictor webserver and datasets are freely accessible at http://csbio.njust.edu.cn/bioinf/muttmpredictor/ for academic use.
format Online
Article
Text
id pubmed-8649221
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-86492212021-12-21 MutTMPredictor: Robust and accurate cascade XGBoost classifier for prediction of mutations in transmembrane proteins Ge, Fang Zhu, Yi-Heng Xu, Jian Muhammad, Arif Song, Jiangning Yu, Dong-Jun Comput Struct Biotechnol J Research Article Transmembrane proteins have critical biological functions and play a role in a multitude of cellular processes including cell signaling, transport of molecules and ions across membranes. Approximately 60% of transmembrane proteins are considered as drug targets. Missense mutations in such proteins can lead to many diverse diseases and disorders, such as neurodegenerative diseases and cystic fibrosis. However, there are limited studies on mutations in transmembrane proteins. In this work, we first design a new feature encoding method, termed weight attenuation position-specific scoring matrix (WAPSSM), which builds upon the protein evolutionary information. Then, we propose a new mutation prediction algorithm (cascade XGBoost) by leveraging the idea learned from consensus predictors and gcForest. Multi-level experiments illustrate the effectiveness of WAPSSM and cascade XGBoost algorithms. Finally, based on WAPSSM and other three types of features, in combination with the cascade XGBoost algorithm, we develop a new transmembrane protein mutation predictor, named MutTMPredictor. We benchmark the performance of MutTMPredictor against several existing predictors on seven datasets. On the 546 mutations dataset, MutTMPredictor achieves the accuracy (ACC) of 0.9661 and the Matthew’s Correlation Coefficient (MCC) of 0.8950. While on the 67,584 dataset, MutTMPredictor achieves an MCC of 0.7523 and area under curve (AUC) of 0.8746, which are 0.1625 and 0.0801 respectively higher than those of the existing best predictor (fathmm). Besides, MutTMPredictor also outperforms two specific predictors on the Pred-MutHTP datasets. The results suggest that MutTMPredictor can be used as an effective method for predicting and prioritizing missense mutations in transmembrane proteins. The MutTMPredictor webserver and datasets are freely accessible at http://csbio.njust.edu.cn/bioinf/muttmpredictor/ for academic use. Research Network of Computational and Structural Biotechnology 2021-11-19 /pmc/articles/PMC8649221/ /pubmed/34938415 http://dx.doi.org/10.1016/j.csbj.2021.11.024 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Ge, Fang
Zhu, Yi-Heng
Xu, Jian
Muhammad, Arif
Song, Jiangning
Yu, Dong-Jun
MutTMPredictor: Robust and accurate cascade XGBoost classifier for prediction of mutations in transmembrane proteins
title MutTMPredictor: Robust and accurate cascade XGBoost classifier for prediction of mutations in transmembrane proteins
title_full MutTMPredictor: Robust and accurate cascade XGBoost classifier for prediction of mutations in transmembrane proteins
title_fullStr MutTMPredictor: Robust and accurate cascade XGBoost classifier for prediction of mutations in transmembrane proteins
title_full_unstemmed MutTMPredictor: Robust and accurate cascade XGBoost classifier for prediction of mutations in transmembrane proteins
title_short MutTMPredictor: Robust and accurate cascade XGBoost classifier for prediction of mutations in transmembrane proteins
title_sort muttmpredictor: robust and accurate cascade xgboost classifier for prediction of mutations in transmembrane proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8649221/
https://www.ncbi.nlm.nih.gov/pubmed/34938415
http://dx.doi.org/10.1016/j.csbj.2021.11.024
work_keys_str_mv AT gefang muttmpredictorrobustandaccuratecascadexgboostclassifierforpredictionofmutationsintransmembraneproteins
AT zhuyiheng muttmpredictorrobustandaccuratecascadexgboostclassifierforpredictionofmutationsintransmembraneproteins
AT xujian muttmpredictorrobustandaccuratecascadexgboostclassifierforpredictionofmutationsintransmembraneproteins
AT muhammadarif muttmpredictorrobustandaccuratecascadexgboostclassifierforpredictionofmutationsintransmembraneproteins
AT songjiangning muttmpredictorrobustandaccuratecascadexgboostclassifierforpredictionofmutationsintransmembraneproteins
AT yudongjun muttmpredictorrobustandaccuratecascadexgboostclassifierforpredictionofmutationsintransmembraneproteins