Cargando…

A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique

Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents. The accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study, a new computational method based on the RF (R...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Runtao, Zhang, Chengjin, Zhang, Lina, Gao, Rui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5820548/
https://www.ncbi.nlm.nih.gov/pubmed/29568772
http://dx.doi.org/10.1155/2018/9364182
_version_ 1783301392582049792
author Yang, Runtao
Zhang, Chengjin
Zhang, Lina
Gao, Rui
author_facet Yang, Runtao
Zhang, Chengjin
Zhang, Lina
Gao, Rui
author_sort Yang, Runtao
collection PubMed
description Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents. The accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study, a new computational method based on the RF (Random Forest) algorithm is proposed for further improving the performance of identifying cancerlectins. Hybrid feature space before feature selection is developed by combining different individual feature spaces, CTD (Composition, Transition, and Distribution), PseAAC (Pseudo Amino Acid Composition), PSSM (Position-Specific Scoring Matrix), and disorder. The SMOTE (Synthetic Minority Oversampling Technique) is applied to solve the imbalanced data problem. To reduce feature redundancy and computation complexity, we propose a two-step feature selection process to select informative features. A 5-fold cross-validation technique is used for the evaluation of various prediction strategies. The proposed method achieves a sensitivity of 0.779, a specificity of 0.717, an accuracy of 0.748, and an MCC (Matthew's Correlation Coefficient) of 0.497. The prediction results are also compared with other existing methods on the same dataset using 5-fold cross-validation. The comparison results demonstrate the high effectiveness of our method for predicting cancerlectins.
format Online
Article
Text
id pubmed-5820548
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-58205482018-03-22 A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique Yang, Runtao Zhang, Chengjin Zhang, Lina Gao, Rui Biomed Res Int Research Article Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents. The accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study, a new computational method based on the RF (Random Forest) algorithm is proposed for further improving the performance of identifying cancerlectins. Hybrid feature space before feature selection is developed by combining different individual feature spaces, CTD (Composition, Transition, and Distribution), PseAAC (Pseudo Amino Acid Composition), PSSM (Position-Specific Scoring Matrix), and disorder. The SMOTE (Synthetic Minority Oversampling Technique) is applied to solve the imbalanced data problem. To reduce feature redundancy and computation complexity, we propose a two-step feature selection process to select informative features. A 5-fold cross-validation technique is used for the evaluation of various prediction strategies. The proposed method achieves a sensitivity of 0.779, a specificity of 0.717, an accuracy of 0.748, and an MCC (Matthew's Correlation Coefficient) of 0.497. The prediction results are also compared with other existing methods on the same dataset using 5-fold cross-validation. The comparison results demonstrate the high effectiveness of our method for predicting cancerlectins. Hindawi 2018-02-07 /pmc/articles/PMC5820548/ /pubmed/29568772 http://dx.doi.org/10.1155/2018/9364182 Text en Copyright © 2018 Runtao Yang et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Yang, Runtao
Zhang, Chengjin
Zhang, Lina
Gao, Rui
A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique
title A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique
title_full A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique
title_fullStr A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique
title_full_unstemmed A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique
title_short A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique
title_sort two-step feature selection method to predict cancerlectins by multiview features and synthetic minority oversampling technique
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5820548/
https://www.ncbi.nlm.nih.gov/pubmed/29568772
http://dx.doi.org/10.1155/2018/9364182
work_keys_str_mv AT yangruntao atwostepfeatureselectionmethodtopredictcancerlectinsbymultiviewfeaturesandsyntheticminorityoversamplingtechnique
AT zhangchengjin atwostepfeatureselectionmethodtopredictcancerlectinsbymultiviewfeaturesandsyntheticminorityoversamplingtechnique
AT zhanglina atwostepfeatureselectionmethodtopredictcancerlectinsbymultiviewfeaturesandsyntheticminorityoversamplingtechnique
AT gaorui atwostepfeatureselectionmethodtopredictcancerlectinsbymultiviewfeaturesandsyntheticminorityoversamplingtechnique
AT yangruntao twostepfeatureselectionmethodtopredictcancerlectinsbymultiviewfeaturesandsyntheticminorityoversamplingtechnique
AT zhangchengjin twostepfeatureselectionmethodtopredictcancerlectinsbymultiviewfeaturesandsyntheticminorityoversamplingtechnique
AT zhanglina twostepfeatureselectionmethodtopredictcancerlectinsbymultiviewfeaturesandsyntheticminorityoversamplingtechnique
AT gaorui twostepfeatureselectionmethodtopredictcancerlectinsbymultiviewfeaturesandsyntheticminorityoversamplingtechnique