Cargando…
An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier
BACKGROUND: Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurat...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Bentham Science Publishers
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8188582/ https://www.ncbi.nlm.nih.gov/pubmed/34220299 http://dx.doi.org/10.2174/1389202922666210219114211 |
_version_ | 1783705359036186624 |
---|---|
author | Tasmia, Samme Amena Ahmed, Fee Faysal Mosharaf, Parvez Hasan, Mehedi Mollah, Nurul Haque |
author_facet | Tasmia, Samme Amena Ahmed, Fee Faysal Mosharaf, Parvez Hasan, Mehedi Mollah, Nurul Haque |
author_sort | Tasmia, Samme Amena |
collection | PubMed |
description | BACKGROUND: Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurate identification of succinylation site is essential to understand the various biological functions and drug development. METHODS: In this study, we developed an improved method to predict lysine succinylation sites mapping on Homo sapiens by the fusion of three encoding schemes such as binary, the composition of k-spaced amino acid pairs (CKSAAP) and amino acid composition (AAC) with the random forest (RF) classifier. The prediction performance of the proposed random forest (RF) based on the fusion model in a comparison of other candidates was investigated by using 20-fold cross-validation (CV) and two independent test datasets were collected from two different sources. RESULTS: The CV results showed that the proposed predictor achieves the highest scores of sensitivity (SN) as 0.800, specificity (SP) as 0.902, accuracy (ACC) as 0.919, Mathew correlation coefficient (MCC) as 0.766 and partial AUC (pAUC) as 0.163 at a false-positive rate (FPR) = 0.10 and area under the ROC curve (AUC) as 0.958. It achieved the highest performance scores of SN as 0.811, SP as 0.902, ACC as 0.891, MCC as 0.629 and pAUC as 0.139 and AUC as 0.921 for the independent test protein set-1 and SN as 0.772, SP as 0.901, ACC as 0.836, MCC as 0.677 and pAUC as 0.141 at FPR = 0.10 and AUC as 0.923 for the independent test protein set-2. It also outperformed all the other existing prediction models. CONCLUSION: The prediction performances as discussed in this article recommend that the proposed method might be a useful and encouraging computational resource for lysine succinylation site prediction in the case of human population. |
format | Online Article Text |
id | pubmed-8188582 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Bentham Science Publishers |
record_format | MEDLINE/PubMed |
spelling | pubmed-81885822021-08-01 An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier Tasmia, Samme Amena Ahmed, Fee Faysal Mosharaf, Parvez Hasan, Mehedi Mollah, Nurul Haque Curr Genomics Article BACKGROUND: Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurate identification of succinylation site is essential to understand the various biological functions and drug development. METHODS: In this study, we developed an improved method to predict lysine succinylation sites mapping on Homo sapiens by the fusion of three encoding schemes such as binary, the composition of k-spaced amino acid pairs (CKSAAP) and amino acid composition (AAC) with the random forest (RF) classifier. The prediction performance of the proposed random forest (RF) based on the fusion model in a comparison of other candidates was investigated by using 20-fold cross-validation (CV) and two independent test datasets were collected from two different sources. RESULTS: The CV results showed that the proposed predictor achieves the highest scores of sensitivity (SN) as 0.800, specificity (SP) as 0.902, accuracy (ACC) as 0.919, Mathew correlation coefficient (MCC) as 0.766 and partial AUC (pAUC) as 0.163 at a false-positive rate (FPR) = 0.10 and area under the ROC curve (AUC) as 0.958. It achieved the highest performance scores of SN as 0.811, SP as 0.902, ACC as 0.891, MCC as 0.629 and pAUC as 0.139 and AUC as 0.921 for the independent test protein set-1 and SN as 0.772, SP as 0.901, ACC as 0.836, MCC as 0.677 and pAUC as 0.141 at FPR = 0.10 and AUC as 0.923 for the independent test protein set-2. It also outperformed all the other existing prediction models. CONCLUSION: The prediction performances as discussed in this article recommend that the proposed method might be a useful and encouraging computational resource for lysine succinylation site prediction in the case of human population. Bentham Science Publishers 2021-02 2021-02 /pmc/articles/PMC8188582/ /pubmed/34220299 http://dx.doi.org/10.2174/1389202922666210219114211 Text en © 2021 Bentham Science Publishers https://creativecommons.org/licenses/by-nc/4.0/ This is an open access article licensed under the terms of the Creative Commons Attribution-Non-Commercial 4.0 International Public License (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/legalcode), which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited. |
spellingShingle | Article Tasmia, Samme Amena Ahmed, Fee Faysal Mosharaf, Parvez Hasan, Mehedi Mollah, Nurul Haque An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier |
title | An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier |
title_full | An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier |
title_fullStr | An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier |
title_full_unstemmed | An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier |
title_short | An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier |
title_sort | improved computational prediction model for lysine succinylation sites mapping on homo sapiens by fusing three sequence encoding schemes with the random forest classifier |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8188582/ https://www.ncbi.nlm.nih.gov/pubmed/34220299 http://dx.doi.org/10.2174/1389202922666210219114211 |
work_keys_str_mv | AT tasmiasammeamena animprovedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier AT ahmedfeefaysal animprovedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier AT mosharafparvez animprovedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier AT hasanmehedi animprovedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier AT mollahnurulhaque animprovedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier AT tasmiasammeamena improvedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier AT ahmedfeefaysal improvedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier AT mosharafparvez improvedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier AT hasanmehedi improvedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier AT mollahnurulhaque improvedcomputationalpredictionmodelforlysinesuccinylationsitesmappingonhomosapiensbyfusingthreesequenceencodingschemeswiththerandomforestclassifier |