Cargando…
Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique
O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the cl...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494626/ https://www.ncbi.nlm.nih.gov/pubmed/26244014 http://dx.doi.org/10.4137/BBI.S26864 |
_version_ | 1782380132415045632 |
---|---|
author | Hassan, Hebatallah Badr, Amr Abdelhalim, MB |
author_facet | Hassan, Hebatallah Badr, Amr Abdelhalim, MB |
author_sort | Hassan, Hebatallah |
collection | PubMed |
description | O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein. |
format | Online Article Text |
id | pubmed-4494626 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-44946262015-08-04 Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique Hassan, Hebatallah Badr, Amr Abdelhalim, MB Bioinform Biol Insights Original Research O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein. Libertas Academica 2015-07-05 /pmc/articles/PMC4494626/ /pubmed/26244014 http://dx.doi.org/10.4137/BBI.S26864 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article published under the Creative Commons CC-BY-NC 3.0 license. |
spellingShingle | Original Research Hassan, Hebatallah Badr, Amr Abdelhalim, MB Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique |
title | Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique |
title_full | Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique |
title_fullStr | Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique |
title_full_unstemmed | Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique |
title_short | Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique |
title_sort | prediction of o-glycosylation sites using random forest and ga-tuned pso technique |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494626/ https://www.ncbi.nlm.nih.gov/pubmed/26244014 http://dx.doi.org/10.4137/BBI.S26864 |
work_keys_str_mv | AT hassanhebatallah predictionofoglycosylationsitesusingrandomforestandgatunedpsotechnique AT badramr predictionofoglycosylationsitesusingrandomforestandgatunedpsotechnique AT abdelhalimmb predictionofoglycosylationsitesusingrandomforestandgatunedpsotechnique |