Cargando…

LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine

BACKGROUND: Lysine lipoylation which is a rare and highly conserved post-translational modification of proteins has been considered as one of the most important processes in the biological field. To obtain a comprehensive understanding of regulatory mechanism of lysine lipoylation, the key is to ide...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Meiqi, Lu, Pengchao, Yang, Yingxi, Liu, Liwen, Wang, Hui, Xu, Yan, Chu, Jixun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Bentham Science Publishers 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7235397/
https://www.ncbi.nlm.nih.gov/pubmed/32476993
http://dx.doi.org/10.2174/1389202919666191014092843
_version_ 1783535958644228096
author Wu, Meiqi
Lu, Pengchao
Yang, Yingxi
Liu, Liwen
Wang, Hui
Xu, Yan
Chu, Jixun
author_facet Wu, Meiqi
Lu, Pengchao
Yang, Yingxi
Liu, Liwen
Wang, Hui
Xu, Yan
Chu, Jixun
author_sort Wu, Meiqi
collection PubMed
description BACKGROUND: Lysine lipoylation which is a rare and highly conserved post-translational modification of proteins has been considered as one of the most important processes in the biological field. To obtain a comprehensive understanding of regulatory mechanism of lysine lipoylation, the key is to identify lysine lipoylated sites. The experimental methods are expensive and laborious. Due to the high cost and complexity of experimental methods, it is urgent to develop computational ways to predict lipoylation sites. METHODOLOGY: In this work, a predictor named LipoSVM is developed to accurately predict lipoylation sites. To overcome the problem of an unbalanced sample, synthetic minority over-sampling technique (SMOTE) is utilized to balance negative and positive samples. Furthermore, different ratios of positive and negative samples are chosen as training sets. RESULTS: By comparing five different encoding schemes and five classification algorithms, LipoSVM is constructed finally by using a training set with positive and negative sample ratio of 1:1, combining with position-specific scoring matrix and support vector machine. The best performance achieves an accuracy of 99.98% and AUC 0.9996 in 10-fold cross-validation. The AUC of independent test set reaches 0.9997, which demonstrates the robustness of LipoSVM. The analysis between lysine lipoylation and non-lipoylation fragments shows significant statistical differences. CONCLUSION: A good predictor for lysine lipoylation is built based on position-specific scoring matrix and support vector machine. Meanwhile, an online webserver LipoSVM can be freely downloaded from https://github.com/stars20180811/LipoSVM.
format Online
Article
Text
id pubmed-7235397
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Bentham Science Publishers
record_format MEDLINE/PubMed
spelling pubmed-72353972020-05-29 LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine Wu, Meiqi Lu, Pengchao Yang, Yingxi Liu, Liwen Wang, Hui Xu, Yan Chu, Jixun Curr Genomics Genomics BACKGROUND: Lysine lipoylation which is a rare and highly conserved post-translational modification of proteins has been considered as one of the most important processes in the biological field. To obtain a comprehensive understanding of regulatory mechanism of lysine lipoylation, the key is to identify lysine lipoylated sites. The experimental methods are expensive and laborious. Due to the high cost and complexity of experimental methods, it is urgent to develop computational ways to predict lipoylation sites. METHODOLOGY: In this work, a predictor named LipoSVM is developed to accurately predict lipoylation sites. To overcome the problem of an unbalanced sample, synthetic minority over-sampling technique (SMOTE) is utilized to balance negative and positive samples. Furthermore, different ratios of positive and negative samples are chosen as training sets. RESULTS: By comparing five different encoding schemes and five classification algorithms, LipoSVM is constructed finally by using a training set with positive and negative sample ratio of 1:1, combining with position-specific scoring matrix and support vector machine. The best performance achieves an accuracy of 99.98% and AUC 0.9996 in 10-fold cross-validation. The AUC of independent test set reaches 0.9997, which demonstrates the robustness of LipoSVM. The analysis between lysine lipoylation and non-lipoylation fragments shows significant statistical differences. CONCLUSION: A good predictor for lysine lipoylation is built based on position-specific scoring matrix and support vector machine. Meanwhile, an online webserver LipoSVM can be freely downloaded from https://github.com/stars20180811/LipoSVM. Bentham Science Publishers 2019-08 2019-08 /pmc/articles/PMC7235397/ /pubmed/32476993 http://dx.doi.org/10.2174/1389202919666191014092843 Text en © 2019 Bentham Science Publishers https://creativecommons.org/licenses/by-nc/4.0/legalcode This is an open access article licensed under the terms of the Creative Commons Attribution-Non-Commercial 4.0 International Public License (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/legalcode), which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.
spellingShingle Genomics
Wu, Meiqi
Lu, Pengchao
Yang, Yingxi
Liu, Liwen
Wang, Hui
Xu, Yan
Chu, Jixun
LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine
title LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine
title_full LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine
title_fullStr LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine
title_full_unstemmed LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine
title_short LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine
title_sort liposvm: prediction of lysine lipoylation in proteins based on the support vector machine
topic Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7235397/
https://www.ncbi.nlm.nih.gov/pubmed/32476993
http://dx.doi.org/10.2174/1389202919666191014092843
work_keys_str_mv AT wumeiqi liposvmpredictionoflysinelipoylationinproteinsbasedonthesupportvectormachine
AT lupengchao liposvmpredictionoflysinelipoylationinproteinsbasedonthesupportvectormachine
AT yangyingxi liposvmpredictionoflysinelipoylationinproteinsbasedonthesupportvectormachine
AT liuliwen liposvmpredictionoflysinelipoylationinproteinsbasedonthesupportvectormachine
AT wanghui liposvmpredictionoflysinelipoylationinproteinsbasedonthesupportvectormachine
AT xuyan liposvmpredictionoflysinelipoylationinproteinsbasedonthesupportvectormachine
AT chujixun liposvmpredictionoflysinelipoylationinproteinsbasedonthesupportvectormachine