Cargando…

Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide

The cancerlectin plays an important role in the initiation, survival, growth, metastasis, and spread of cancer. Therefore, to study the function of cancerlectin is greatly significant because it can help to identify tumor markers and tumor prevention, treatment, and prognosis. However, plenty of stu...

Descripción completa

Detalles Bibliográficos
Autores principales: Qian, Lili, Wen, Yaping, Han, Guosheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147460/
https://www.ncbi.nlm.nih.gov/pubmed/32318092
http://dx.doi.org/10.3389/fgene.2020.00275
_version_ 1783520425863544832
author Qian, Lili
Wen, Yaping
Han, Guosheng
author_facet Qian, Lili
Wen, Yaping
Han, Guosheng
author_sort Qian, Lili
collection PubMed
description The cancerlectin plays an important role in the initiation, survival, growth, metastasis, and spread of cancer. Therefore, to study the function of cancerlectin is greatly significant because it can help to identify tumor markers and tumor prevention, treatment, and prognosis. However, plenty of studies have generated a large amount of protein data. Traditional prediction methods have been unable to meet the needs of analysis. Developing powerful computational models based on these data to discriminate cancerlectins and non-cancerlectins on a large scale has been treated as one of the most important topics. In this study, we developed a feature extraction method to identify cancerlectins based on fusion of g-gap dipeptides. The analysis of variance was used to select the optimal feature set and a support vector machine was used to classify the data. The rigorous nested 10-fold cross-validation results, demonstrated that our method obtained the prediction accuracy of 83.91% and sensitivity of 83.15%. At the same time, in order to evaluate the performance of the classification model constructed in this work, we constructed a new data set. The prediction accuracy of the new data set reaches 83.3%. Experimental results show that the performance of our method is better than the state-of-the-art methods.
format Online
Article
Text
id pubmed-7147460
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-71474602020-04-21 Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide Qian, Lili Wen, Yaping Han, Guosheng Front Genet Genetics The cancerlectin plays an important role in the initiation, survival, growth, metastasis, and spread of cancer. Therefore, to study the function of cancerlectin is greatly significant because it can help to identify tumor markers and tumor prevention, treatment, and prognosis. However, plenty of studies have generated a large amount of protein data. Traditional prediction methods have been unable to meet the needs of analysis. Developing powerful computational models based on these data to discriminate cancerlectins and non-cancerlectins on a large scale has been treated as one of the most important topics. In this study, we developed a feature extraction method to identify cancerlectins based on fusion of g-gap dipeptides. The analysis of variance was used to select the optimal feature set and a support vector machine was used to classify the data. The rigorous nested 10-fold cross-validation results, demonstrated that our method obtained the prediction accuracy of 83.91% and sensitivity of 83.15%. At the same time, in order to evaluate the performance of the classification model constructed in this work, we constructed a new data set. The prediction accuracy of the new data set reaches 83.3%. Experimental results show that the performance of our method is better than the state-of-the-art methods. Frontiers Media S.A. 2020-04-03 /pmc/articles/PMC7147460/ /pubmed/32318092 http://dx.doi.org/10.3389/fgene.2020.00275 Text en Copyright © 2020 Qian, Wen and Han. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Qian, Lili
Wen, Yaping
Han, Guosheng
Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide
title Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide
title_full Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide
title_fullStr Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide
title_full_unstemmed Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide
title_short Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide
title_sort identification of cancerlectins using support vector machines with fusion of g-gap dipeptide
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147460/
https://www.ncbi.nlm.nih.gov/pubmed/32318092
http://dx.doi.org/10.3389/fgene.2020.00275
work_keys_str_mv AT qianlili identificationofcancerlectinsusingsupportvectormachineswithfusionofggapdipeptide
AT wenyaping identificationofcancerlectinsusingsupportvectormachineswithfusionofggapdipeptide
AT hanguosheng identificationofcancerlectinsusingsupportvectormachineswithfusionofggapdipeptide