Cargando…

EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites

Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date, several com...

Descripción completa

Detalles Bibliográficos
Autores principales: Nan, Xuanguo, Bao, Lingling, Zhao, Xiaosa, Zhao, Xiaowei, Sangaiah, Arun Kumar, Wang, Gai-Ge, Ma, Zhiqiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151806/
https://www.ncbi.nlm.nih.gov/pubmed/28872627
http://dx.doi.org/10.3390/molecules22091463
_version_ 1783357235627294720
author Nan, Xuanguo
Bao, Lingling
Zhao, Xiaosa
Zhao, Xiaowei
Sangaiah, Arun Kumar
Wang, Gai-Ge
Ma, Zhiqiang
author_facet Nan, Xuanguo
Bao, Lingling
Zhao, Xiaosa
Zhao, Xiaowei
Sangaiah, Arun Kumar
Wang, Gai-Ge
Ma, Zhiqiang
author_sort Nan, Xuanguo
collection PubMed
description Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date, several computational methods have been established for the prediction of pupylation sites which usually artificially design the negative samples using the verified pupylation proteins to train the classifiers. However, if this process is not properly done it can affect the performance of the final predictor dramatically. In this work, different from previous computational methods, we proposed an enhanced positive-unlabeled learning algorithm (EPuL) to the pupylation site prediction problem, which uses only positive and unlabeled samples. Firstly, we separate the training dataset into the positive dataset and the unlabeled dataset which contains the remaining non-annotated lysine residues. Then, the EPuL algorithm is utilized to select the reliably negative initial dataset and then iteratively pick out the non-pupylation sites. The performance of the proposed method was measured with an accuracy of 90.24%, an Area Under Curve (AUC) of 0.93 and an MCC of 0.81 by 10-fold cross-validation. A user-friendly web server for predicting pupylation sites was developed and was freely available at http://59.73.198.144:8080/EPuL.
format Online
Article
Text
id pubmed-6151806
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-61518062018-11-13 EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites Nan, Xuanguo Bao, Lingling Zhao, Xiaosa Zhao, Xiaowei Sangaiah, Arun Kumar Wang, Gai-Ge Ma, Zhiqiang Molecules Article Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date, several computational methods have been established for the prediction of pupylation sites which usually artificially design the negative samples using the verified pupylation proteins to train the classifiers. However, if this process is not properly done it can affect the performance of the final predictor dramatically. In this work, different from previous computational methods, we proposed an enhanced positive-unlabeled learning algorithm (EPuL) to the pupylation site prediction problem, which uses only positive and unlabeled samples. Firstly, we separate the training dataset into the positive dataset and the unlabeled dataset which contains the remaining non-annotated lysine residues. Then, the EPuL algorithm is utilized to select the reliably negative initial dataset and then iteratively pick out the non-pupylation sites. The performance of the proposed method was measured with an accuracy of 90.24%, an Area Under Curve (AUC) of 0.93 and an MCC of 0.81 by 10-fold cross-validation. A user-friendly web server for predicting pupylation sites was developed and was freely available at http://59.73.198.144:8080/EPuL. MDPI 2017-09-05 /pmc/articles/PMC6151806/ /pubmed/28872627 http://dx.doi.org/10.3390/molecules22091463 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Nan, Xuanguo
Bao, Lingling
Zhao, Xiaosa
Zhao, Xiaowei
Sangaiah, Arun Kumar
Wang, Gai-Ge
Ma, Zhiqiang
EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites
title EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites
title_full EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites
title_fullStr EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites
title_full_unstemmed EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites
title_short EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites
title_sort epul: an enhanced positive-unlabeled learning algorithm for the prediction of pupylation sites
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151806/
https://www.ncbi.nlm.nih.gov/pubmed/28872627
http://dx.doi.org/10.3390/molecules22091463
work_keys_str_mv AT nanxuanguo epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites
AT baolingling epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites
AT zhaoxiaosa epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites
AT zhaoxiaowei epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites
AT sangaiaharunkumar epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites
AT wanggaige epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites
AT mazhiqiang epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites