Cargando…

Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests

Essential proteins include the minimum required set of proteins to support cell life. Identifying essential proteins is important for understanding the cellular processes of an organism. However, identifying essential proteins experimentally is extremely time-consuming and labor-intensive. Alternati...

Descripción completa

Detalles Bibliográficos
Autores principales: Hor, Chiou-Yi, Yang, Chang-Biau, Yang, Zih-Jie, Tseng, Chiou-Ting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3795531/
https://www.ncbi.nlm.nih.gov/pubmed/24250217
http://dx.doi.org/10.4137/EBO.S11975
_version_ 1782287386597654528
author Hor, Chiou-Yi
Yang, Chang-Biau
Yang, Zih-Jie
Tseng, Chiou-Ting
author_facet Hor, Chiou-Yi
Yang, Chang-Biau
Yang, Zih-Jie
Tseng, Chiou-Ting
author_sort Hor, Chiou-Yi
collection PubMed
description Essential proteins include the minimum required set of proteins to support cell life. Identifying essential proteins is important for understanding the cellular processes of an organism. However, identifying essential proteins experimentally is extremely time-consuming and labor-intensive. Alternative methods must be developed to examine essential proteins. There were two goals in this study: identifying the important features and building learning machines for discriminating essential proteins. Data for Saccharomyces cerevisiae and Escherichia coli were used. We first collected information from a variety of sources. We next proposed a modified backward feature selection method and build support vector machines (SVM) predictors based on the selected features. To evaluate the performance, we conducted cross-validations for the originally imbalanced data set and the down-sampling balanced data set. The statistical tests were applied on the performance associated with obtained feature subsets to confirm their significance. In the first data set, our best values of F-measure and Matthews correlation coefficient (MCC) were 0.549 and 0.495 in the imbalanced experiments. For the balanced experiment, the best values of F-measure and MCC were 0.770 and 0.545, respectively. In the second data set, our best values of F-measure and MCC were 0.421 and 0.407 in the imbalanced experiments. For the balanced experiment, the best values of F-measure and MCC were 0.718 and 0.448, respectively. The experimental results show that our selected features are compact and the performance improved. Prediction can also be conducted by users at the following internet address: http://bio2.cse.nsysu.edu.tw/esspredict.aspx.
format Online
Article
Text
id pubmed-3795531
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-37955312013-11-18 Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests Hor, Chiou-Yi Yang, Chang-Biau Yang, Zih-Jie Tseng, Chiou-Ting Evol Bioinform Online Original Research Essential proteins include the minimum required set of proteins to support cell life. Identifying essential proteins is important for understanding the cellular processes of an organism. However, identifying essential proteins experimentally is extremely time-consuming and labor-intensive. Alternative methods must be developed to examine essential proteins. There were two goals in this study: identifying the important features and building learning machines for discriminating essential proteins. Data for Saccharomyces cerevisiae and Escherichia coli were used. We first collected information from a variety of sources. We next proposed a modified backward feature selection method and build support vector machines (SVM) predictors based on the selected features. To evaluate the performance, we conducted cross-validations for the originally imbalanced data set and the down-sampling balanced data set. The statistical tests were applied on the performance associated with obtained feature subsets to confirm their significance. In the first data set, our best values of F-measure and Matthews correlation coefficient (MCC) were 0.549 and 0.495 in the imbalanced experiments. For the balanced experiment, the best values of F-measure and MCC were 0.770 and 0.545, respectively. In the second data set, our best values of F-measure and MCC were 0.421 and 0.407 in the imbalanced experiments. For the balanced experiment, the best values of F-measure and MCC were 0.718 and 0.448, respectively. The experimental results show that our selected features are compact and the performance improved. Prediction can also be conducted by users at the following internet address: http://bio2.cse.nsysu.edu.tw/esspredict.aspx. Libertas Academica 2013-10-03 /pmc/articles/PMC3795531/ /pubmed/24250217 http://dx.doi.org/10.4137/EBO.S11975 Text en © 2013 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article published under the Creative Commons CC-BY-NC 3.0 license.
spellingShingle Original Research
Hor, Chiou-Yi
Yang, Chang-Biau
Yang, Zih-Jie
Tseng, Chiou-Ting
Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests
title Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests
title_full Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests
title_fullStr Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests
title_full_unstemmed Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests
title_short Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests
title_sort prediction of protein essentiality by the support vector machine with statistical tests
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3795531/
https://www.ncbi.nlm.nih.gov/pubmed/24250217
http://dx.doi.org/10.4137/EBO.S11975
work_keys_str_mv AT horchiouyi predictionofproteinessentialitybythesupportvectormachinewithstatisticaltests
AT yangchangbiau predictionofproteinessentialitybythesupportvectormachinewithstatisticaltests
AT yangzihjie predictionofproteinessentialitybythesupportvectormachinewithstatisticaltests
AT tsengchiouting predictionofproteinessentialitybythesupportvectormachinewithstatisticaltests