Cargando…

An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier

BACKGROUND: Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurat...

Descripción completa

Detalles Bibliográficos
Autores principales: Tasmia, Samme Amena, Ahmed, Fee Faysal, Mosharaf, Parvez, Hasan, Mehedi, Mollah, Nurul Haque
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Bentham Science Publishers 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8188582/
https://www.ncbi.nlm.nih.gov/pubmed/34220299
http://dx.doi.org/10.2174/1389202922666210219114211
_version_ 1783705359036186624
author Tasmia, Samme Amena
Ahmed, Fee Faysal
Mosharaf, Parvez
Hasan, Mehedi
Mollah, Nurul Haque
author_facet Tasmia, Samme Amena
Ahmed, Fee Faysal
Mosharaf, Parvez
Hasan, Mehedi
Mollah, Nurul Haque
author_sort Tasmia, Samme Amena
collection PubMed
description BACKGROUND: Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurate identification of succinylation site is essential to understand the various biological functions and drug development. METHODS: In this study, we developed an improved method to predict lysine succinylation sites mapping on Homo sapiens by the fusion of three encoding schemes such as binary, the composition of k-spaced amino acid pairs (CKSAAP) and amino acid composition (AAC) with the random forest (RF) classifier. The prediction performance of the proposed random forest (RF) based on the fusion model in a comparison of other candidates was investigated by using 20-fold cross-validation (CV) and two independent test datasets were collected from two different sources. RESULTS: The CV results showed that the proposed predictor achieves the highest scores of sensitivity (SN) as 0.800, specificity (SP) as 0.902, accuracy (ACC) as 0.919, Mathew correlation coefficient (MCC) as 0.766 and partial AUC (pAUC) as 0.163 at a false-positive rate (FPR) = 0.10 and area under the ROC curve (AUC) as 0.958. It achieved the highest performance scores of SN as 0.811, SP as 0.902, ACC as 0.891, MCC as 0.629 and pAUC as 0.139 and AUC as 0.921 for the independent test protein set-1 and SN as 0.772, SP as 0.901, ACC as 0.836, MCC as 0.677 and pAUC as 0.141 at FPR = 0.10 and AUC as 0.923 for the independent test protein set-2. It also outperformed all the other existing prediction models. CONCLUSION: The prediction performances as discussed in this article recommend that the proposed method might be a useful and encouraging computational resource for lysine succinylation site prediction in the case of human population.
format Online
Article
Text
id pubmed-8188582
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Bentham Science Publishers
record_format MEDLINE/PubMed
spelling pubmed-81885822021-08-01 An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier Tasmia, Samme Amena Ahmed, Fee Faysal Mosharaf, Parvez Hasan, Mehedi Mollah, Nurul Haque Curr Genomics Article BACKGROUND: Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurate identification of succinylation site is essential to understand the various biological functions and drug development. METHODS: In this study, we developed an improved method to predict lysine succinylation sites mapping on Homo sapiens by the fusion of three encoding schemes such as binary, the composition of k-spaced amino acid pairs (CKSAAP) and amino acid composition (AAC) with the random forest (RF) classifier. The prediction performance of the proposed random forest (RF) based on the fusion model in a comparison of other candidates was investigated by using 20-fold cross-validation (CV) and two independent test datasets were collected from two different sources. RESULTS: The CV results showed that the proposed predictor achieves the highest scores of sensitivity (SN) as 0.800, specificity (SP) as 0.902, accuracy (ACC) as 0.919, Mathew correlation coefficient (MCC) as 0.766 and partial AUC (pAUC) as 0.163 at a false-positive rate (FPR) = 0.10 and area under the ROC curve (AUC) as 0.958. It achieved the highest performance scores of SN as 0.811, SP as 0.902, ACC as 0.891, MCC as 0.629 and pAUC as 0.139 and AUC as 0.921 for the independent test protein set-1 and SN as 0.772, SP as 0.901, ACC as 0.836, MCC as 0.677 and pAUC as 0.141 at FPR = 0.10 and AUC as 0.923 for the independent test protein set-2. It also outperformed all the other existing prediction models. CONCLUSION: The prediction performances as discussed in this article recommend that the proposed method might be a useful and encouraging computational resource for lysine succinylation site prediction in the case of human population. Bentham Science Publishers 2021-02 2021-02 /pmc/articles/PMC8188582/ /pubmed/34220299 http://dx.doi.org/10.2174/1389202922666210219114211 Text en © 2021 Bentham Science Publishers https://creativecommons.org/licenses/by-nc/4.0/ This is an open access article licensed under the terms of the Creative Commons Attribution-Non-Commercial 4.0 International Public License (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/legalcode), which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.
spellingShingle Article
Tasmia, Samme Amena
Ahmed, Fee Faysal
Mosharaf, Parvez
Hasan, Mehedi
Mollah, Nurul Haque
An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier
title An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier
title_full An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier
title_fullStr An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier
title_full_unstemmed An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier
title_short An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier
title_sort improved computational prediction model for lysine succinylation sites mapping on homo sapiens by fusing three sequence encoding schemes with the random forest classifier
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8188582/
https://www.ncbi.nlm.nih.gov/pubmed/34220299
http://dx.doi.org/10.2174/1389202922666210219114211
work_keys_str_mv AT tasmiasammeamena animprovedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier
AT ahmedfeefaysal animprovedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier
AT mosharafparvez animprovedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier
AT hasanmehedi animprovedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier
AT mollahnurulhaque animprovedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier
AT tasmiasammeamena improvedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier
AT ahmedfeefaysal improvedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier
AT mosharafparvez improvedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier
AT hasanmehedi improvedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier
AT mollahnurulhaque improvedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier