Cargando…
EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites
Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date, several com...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151806/ https://www.ncbi.nlm.nih.gov/pubmed/28872627 http://dx.doi.org/10.3390/molecules22091463 |
_version_ | 1783357235627294720 |
---|---|
author | Nan, Xuanguo Bao, Lingling Zhao, Xiaosa Zhao, Xiaowei Sangaiah, Arun Kumar Wang, Gai-Ge Ma, Zhiqiang |
author_facet | Nan, Xuanguo Bao, Lingling Zhao, Xiaosa Zhao, Xiaowei Sangaiah, Arun Kumar Wang, Gai-Ge Ma, Zhiqiang |
author_sort | Nan, Xuanguo |
collection | PubMed |
description | Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date, several computational methods have been established for the prediction of pupylation sites which usually artificially design the negative samples using the verified pupylation proteins to train the classifiers. However, if this process is not properly done it can affect the performance of the final predictor dramatically. In this work, different from previous computational methods, we proposed an enhanced positive-unlabeled learning algorithm (EPuL) to the pupylation site prediction problem, which uses only positive and unlabeled samples. Firstly, we separate the training dataset into the positive dataset and the unlabeled dataset which contains the remaining non-annotated lysine residues. Then, the EPuL algorithm is utilized to select the reliably negative initial dataset and then iteratively pick out the non-pupylation sites. The performance of the proposed method was measured with an accuracy of 90.24%, an Area Under Curve (AUC) of 0.93 and an MCC of 0.81 by 10-fold cross-validation. A user-friendly web server for predicting pupylation sites was developed and was freely available at http://59.73.198.144:8080/EPuL. |
format | Online Article Text |
id | pubmed-6151806 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-61518062018-11-13 EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites Nan, Xuanguo Bao, Lingling Zhao, Xiaosa Zhao, Xiaowei Sangaiah, Arun Kumar Wang, Gai-Ge Ma, Zhiqiang Molecules Article Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date, several computational methods have been established for the prediction of pupylation sites which usually artificially design the negative samples using the verified pupylation proteins to train the classifiers. However, if this process is not properly done it can affect the performance of the final predictor dramatically. In this work, different from previous computational methods, we proposed an enhanced positive-unlabeled learning algorithm (EPuL) to the pupylation site prediction problem, which uses only positive and unlabeled samples. Firstly, we separate the training dataset into the positive dataset and the unlabeled dataset which contains the remaining non-annotated lysine residues. Then, the EPuL algorithm is utilized to select the reliably negative initial dataset and then iteratively pick out the non-pupylation sites. The performance of the proposed method was measured with an accuracy of 90.24%, an Area Under Curve (AUC) of 0.93 and an MCC of 0.81 by 10-fold cross-validation. A user-friendly web server for predicting pupylation sites was developed and was freely available at http://59.73.198.144:8080/EPuL. MDPI 2017-09-05 /pmc/articles/PMC6151806/ /pubmed/28872627 http://dx.doi.org/10.3390/molecules22091463 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Nan, Xuanguo Bao, Lingling Zhao, Xiaosa Zhao, Xiaowei Sangaiah, Arun Kumar Wang, Gai-Ge Ma, Zhiqiang EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites |
title | EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites |
title_full | EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites |
title_fullStr | EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites |
title_full_unstemmed | EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites |
title_short | EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites |
title_sort | epul: an enhanced positive-unlabeled learning algorithm for the prediction of pupylation sites |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151806/ https://www.ncbi.nlm.nih.gov/pubmed/28872627 http://dx.doi.org/10.3390/molecules22091463 |
work_keys_str_mv | AT nanxuanguo epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites AT baolingling epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites AT zhaoxiaosa epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites AT zhaoxiaowei epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites AT sangaiaharunkumar epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites AT wanggaige epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites AT mazhiqiang epulanenhancedpositiveunlabeledlearningalgorithmforthepredictionofpupylationsites |