Cargando…

A Consistency-Based Feature Selection Method Allied with Linear SVMs for HIV-1 Protease Cleavage Site Prediction

BACKGROUND: Predicting type-1 Human Immunodeficiency Virus (HIV-1) protease cleavage site in protein molecules and determining its specificity is an important task which has attracted considerable attention in the research community. Achievements in this area are expected to result in effective drug...

Descripción completa

Detalles Bibliográficos
Autores principales: Öztürk, Orkun, Aksaç, Alper, Elsheikh, Abdallah, Özyer, Tansel, Alhajj, Reda
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751940/
https://www.ncbi.nlm.nih.gov/pubmed/24058397
http://dx.doi.org/10.1371/journal.pone.0063145
_version_ 1782281707279351808
author Öztürk, Orkun
Aksaç, Alper
Elsheikh, Abdallah
Özyer, Tansel
Alhajj, Reda
author_facet Öztürk, Orkun
Aksaç, Alper
Elsheikh, Abdallah
Özyer, Tansel
Alhajj, Reda
author_sort Öztürk, Orkun
collection PubMed
description BACKGROUND: Predicting type-1 Human Immunodeficiency Virus (HIV-1) protease cleavage site in protein molecules and determining its specificity is an important task which has attracted considerable attention in the research community. Achievements in this area are expected to result in effective drug design (especially for HIV-1 protease inhibitors) against this life-threatening virus. However, some drawbacks (like the shortage of the available training data and the high dimensionality of the feature space) turn this task into a difficult classification problem. Thus, various machine learning techniques, and specifically several classification methods have been proposed in order to increase the accuracy of the classification model. In addition, for several classification problems, which are characterized by having few samples and many features, selecting the most relevant features is a major factor for increasing classification accuracy. RESULTS: We propose for HIV-1 data a consistency-based feature selection approach in conjunction with recursive feature elimination of support vector machines (SVMs). We used various classifiers for evaluating the results obtained from the feature selection process. We further demonstrated the effectiveness of our proposed method by comparing it with a state-of-the-art feature selection method applied on HIV-1 data, and we evaluated the reported results based on attributes which have been selected from different combinations. CONCLUSION: Applying feature selection on training data before realizing the classification task seems to be a reasonable data-mining process when working with types of data similar to HIV-1. On HIV-1 data, some feature selection or extraction operations in conjunction with different classifiers have been tested and noteworthy outcomes have been reported. These facts motivate for the work presented in this paper. SOFTWARE AVAILABILITY: The software is available at http://ozyer.etu.edu.tr/c-fs-svm.rar. The software can be downloaded at esnag.etu.edu.tr/software/hiv_cleavage_site_prediction.rar; you will find a readme file which explains how to set the software in order to work.
format Online
Article
Text
id pubmed-3751940
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37519402013-09-20 A Consistency-Based Feature Selection Method Allied with Linear SVMs for HIV-1 Protease Cleavage Site Prediction Öztürk, Orkun Aksaç, Alper Elsheikh, Abdallah Özyer, Tansel Alhajj, Reda PLoS One Research Article BACKGROUND: Predicting type-1 Human Immunodeficiency Virus (HIV-1) protease cleavage site in protein molecules and determining its specificity is an important task which has attracted considerable attention in the research community. Achievements in this area are expected to result in effective drug design (especially for HIV-1 protease inhibitors) against this life-threatening virus. However, some drawbacks (like the shortage of the available training data and the high dimensionality of the feature space) turn this task into a difficult classification problem. Thus, various machine learning techniques, and specifically several classification methods have been proposed in order to increase the accuracy of the classification model. In addition, for several classification problems, which are characterized by having few samples and many features, selecting the most relevant features is a major factor for increasing classification accuracy. RESULTS: We propose for HIV-1 data a consistency-based feature selection approach in conjunction with recursive feature elimination of support vector machines (SVMs). We used various classifiers for evaluating the results obtained from the feature selection process. We further demonstrated the effectiveness of our proposed method by comparing it with a state-of-the-art feature selection method applied on HIV-1 data, and we evaluated the reported results based on attributes which have been selected from different combinations. CONCLUSION: Applying feature selection on training data before realizing the classification task seems to be a reasonable data-mining process when working with types of data similar to HIV-1. On HIV-1 data, some feature selection or extraction operations in conjunction with different classifiers have been tested and noteworthy outcomes have been reported. These facts motivate for the work presented in this paper. SOFTWARE AVAILABILITY: The software is available at http://ozyer.etu.edu.tr/c-fs-svm.rar. The software can be downloaded at esnag.etu.edu.tr/software/hiv_cleavage_site_prediction.rar; you will find a readme file which explains how to set the software in order to work. Public Library of Science 2013-08-23 /pmc/articles/PMC3751940/ /pubmed/24058397 http://dx.doi.org/10.1371/journal.pone.0063145 Text en © 2013 Öztürk et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Öztürk, Orkun
Aksaç, Alper
Elsheikh, Abdallah
Özyer, Tansel
Alhajj, Reda
A Consistency-Based Feature Selection Method Allied with Linear SVMs for HIV-1 Protease Cleavage Site Prediction
title A Consistency-Based Feature Selection Method Allied with Linear SVMs for HIV-1 Protease Cleavage Site Prediction
title_full A Consistency-Based Feature Selection Method Allied with Linear SVMs for HIV-1 Protease Cleavage Site Prediction
title_fullStr A Consistency-Based Feature Selection Method Allied with Linear SVMs for HIV-1 Protease Cleavage Site Prediction
title_full_unstemmed A Consistency-Based Feature Selection Method Allied with Linear SVMs for HIV-1 Protease Cleavage Site Prediction
title_short A Consistency-Based Feature Selection Method Allied with Linear SVMs for HIV-1 Protease Cleavage Site Prediction
title_sort consistency-based feature selection method allied with linear svms for hiv-1 protease cleavage site prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751940/
https://www.ncbi.nlm.nih.gov/pubmed/24058397
http://dx.doi.org/10.1371/journal.pone.0063145
work_keys_str_mv AT ozturkorkun aconsistencybasedfeatureselectionmethodalliedwithlinearsvmsforhiv1proteasecleavagesiteprediction
AT aksacalper aconsistencybasedfeatureselectionmethodalliedwithlinearsvmsforhiv1proteasecleavagesiteprediction
AT elsheikhabdallah aconsistencybasedfeatureselectionmethodalliedwithlinearsvmsforhiv1proteasecleavagesiteprediction
AT ozyertansel aconsistencybasedfeatureselectionmethodalliedwithlinearsvmsforhiv1proteasecleavagesiteprediction
AT alhajjreda aconsistencybasedfeatureselectionmethodalliedwithlinearsvmsforhiv1proteasecleavagesiteprediction
AT ozturkorkun consistencybasedfeatureselectionmethodalliedwithlinearsvmsforhiv1proteasecleavagesiteprediction
AT aksacalper consistencybasedfeatureselectionmethodalliedwithlinearsvmsforhiv1proteasecleavagesiteprediction
AT elsheikhabdallah consistencybasedfeatureselectionmethodalliedwithlinearsvmsforhiv1proteasecleavagesiteprediction
AT ozyertansel consistencybasedfeatureselectionmethodalliedwithlinearsvmsforhiv1proteasecleavagesiteprediction
AT alhajjreda consistencybasedfeatureselectionmethodalliedwithlinearsvmsforhiv1proteasecleavagesiteprediction