Cargando…

BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection

BACKGROUND: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many...

Descripción completa

Detalles Bibliográficos
Autores principales: Kandaswamy, Krishna Kumar, Pugalenthi, Ganesan, Hazrati, Mehrnaz Khodam, Kalies, Kai-Uwe, Martinetz, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176267/
https://www.ncbi.nlm.nih.gov/pubmed/21849049
http://dx.doi.org/10.1186/1471-2105-12-345
_version_ 1782212207335964672
author Kandaswamy, Krishna Kumar
Pugalenthi, Ganesan
Hazrati, Mehrnaz Khodam
Kalies, Kai-Uwe
Martinetz, Thomas
author_facet Kandaswamy, Krishna Kumar
Pugalenthi, Ganesan
Hazrati, Mehrnaz Khodam
Kalies, Kai-Uwe
Martinetz, Thomas
author_sort Kandaswamy, Krishna Kumar
collection PubMed
description BACKGROUND: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence. RESULTS: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated. CONCLUSION: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. The BLProt software is available at http://www.inb.uni-luebeck.de/tools-demos/bioluminescent%20protein/BLProt
format Online
Article
Text
id pubmed-3176267
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31762672011-09-21 BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection Kandaswamy, Krishna Kumar Pugalenthi, Ganesan Hazrati, Mehrnaz Khodam Kalies, Kai-Uwe Martinetz, Thomas BMC Bioinformatics Research Article BACKGROUND: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence. RESULTS: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated. CONCLUSION: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. The BLProt software is available at http://www.inb.uni-luebeck.de/tools-demos/bioluminescent%20protein/BLProt BioMed Central 2011-08-17 /pmc/articles/PMC3176267/ /pubmed/21849049 http://dx.doi.org/10.1186/1471-2105-12-345 Text en Copyright ©2011 Kandaswamy et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kandaswamy, Krishna Kumar
Pugalenthi, Ganesan
Hazrati, Mehrnaz Khodam
Kalies, Kai-Uwe
Martinetz, Thomas
BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection
title BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection
title_full BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection
title_fullStr BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection
title_full_unstemmed BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection
title_short BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection
title_sort blprot: prediction of bioluminescent proteins based on support vector machine and relieff feature selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176267/
https://www.ncbi.nlm.nih.gov/pubmed/21849049
http://dx.doi.org/10.1186/1471-2105-12-345
work_keys_str_mv AT kandaswamykrishnakumar blprotpredictionofbioluminescentproteinsbasedonsupportvectormachineandreliefffeatureselection
AT pugalenthiganesan blprotpredictionofbioluminescentproteinsbasedonsupportvectormachineandreliefffeatureselection
AT hazratimehrnazkhodam blprotpredictionofbioluminescentproteinsbasedonsupportvectormachineandreliefffeatureselection
AT kalieskaiuwe blprotpredictionofbioluminescentproteinsbasedonsupportvectormachineandreliefffeatureselection
AT martinetzthomas blprotpredictionofbioluminescentproteinsbasedonsupportvectormachineandreliefffeatureselection