Cargando…

Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique

O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the cl...

Descripción completa

Detalles Bibliográficos
Autores principales: Hassan, Hebatallah, Badr, Amr, Abdelhalim, MB
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494626/
https://www.ncbi.nlm.nih.gov/pubmed/26244014
http://dx.doi.org/10.4137/BBI.S26864
_version_ 1782380132415045632
author Hassan, Hebatallah
Badr, Amr
Abdelhalim, MB
author_facet Hassan, Hebatallah
Badr, Amr
Abdelhalim, MB
author_sort Hassan, Hebatallah
collection PubMed
description O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein.
format Online
Article
Text
id pubmed-4494626
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-44946262015-08-04 Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique Hassan, Hebatallah Badr, Amr Abdelhalim, MB Bioinform Biol Insights Original Research O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein. Libertas Academica 2015-07-05 /pmc/articles/PMC4494626/ /pubmed/26244014 http://dx.doi.org/10.4137/BBI.S26864 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article published under the Creative Commons CC-BY-NC 3.0 license.
spellingShingle Original Research
Hassan, Hebatallah
Badr, Amr
Abdelhalim, MB
Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique
title Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique
title_full Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique
title_fullStr Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique
title_full_unstemmed Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique
title_short Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique
title_sort prediction of o-glycosylation sites using random forest and ga-tuned pso technique
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494626/
https://www.ncbi.nlm.nih.gov/pubmed/26244014
http://dx.doi.org/10.4137/BBI.S26864
work_keys_str_mv AT hassanhebatallah predictionofoglycosylationsitesusingrandomforestandgatunedpsotechnique
AT badramr predictionofoglycosylationsitesusingrandomforestandgatunedpsotechnique
AT abdelhalimmb predictionofoglycosylationsitesusingrandomforestandgatunedpsotechnique