Cargando…

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate

The function of a protein is of great interest in the cutting-edge research of biological mechanisms, disease development and drug/target discovery. Besides experimental explorations, a variety of computational methods have been designed to predict protein function. Among these in silico methods, th...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Chun Yan, Li, Xiao Xu, Yang, Hong, Li, Ying Hong, Xue, Wei Wei, Chen, Yu Zong, Tao, Lin, Zhu, Feng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5796132/
https://www.ncbi.nlm.nih.gov/pubmed/29316706
http://dx.doi.org/10.3390/ijms19010183
_version_ 1783297440551534592
author Yu, Chun Yan
Li, Xiao Xu
Yang, Hong
Li, Ying Hong
Xue, Wei Wei
Chen, Yu Zong
Tao, Lin
Zhu, Feng
author_facet Yu, Chun Yan
Li, Xiao Xu
Yang, Hong
Li, Ying Hong
Xue, Wei Wei
Chen, Yu Zong
Tao, Lin
Zhu, Feng
author_sort Yu, Chun Yan
collection PubMed
description The function of a protein is of great interest in the cutting-edge research of biological mechanisms, disease development and drug/target discovery. Besides experimental explorations, a variety of computational methods have been designed to predict protein function. Among these in silico methods, the prediction of BLAST is based on protein sequence similarity, while that of machine learning is also based on the sequence, but without the consideration of their similarity. This unique characteristic of machine learning makes it a good complement to BLAST and many other approaches in predicting the function of remotely relevant proteins and the homologous proteins of distinct function. However, the identification accuracies of these in silico methods and their false discovery rate have not yet been assessed so far, which greatly limits the usage of these algorithms. Herein, a comprehensive comparison of the performances among four popular prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these methods was systematically assessed by four standard statistical indexes based on the independent test datasets of 93 functional protein families defined by UniProtKB keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model organisms (Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Mycobacterium tuberculosis). As a result, the substantially higher sensitivity of SVM and BLAST was observed compared with that of PNN and KNN. However, the machine learning algorithms (PNN, KNN and SVM) were found capable of substantially reducing the false discovery rate (SVM < PNN < KNN). In sum, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research.
format Online
Article
Text
id pubmed-5796132
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-57961322018-02-09 Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate Yu, Chun Yan Li, Xiao Xu Yang, Hong Li, Ying Hong Xue, Wei Wei Chen, Yu Zong Tao, Lin Zhu, Feng Int J Mol Sci Article The function of a protein is of great interest in the cutting-edge research of biological mechanisms, disease development and drug/target discovery. Besides experimental explorations, a variety of computational methods have been designed to predict protein function. Among these in silico methods, the prediction of BLAST is based on protein sequence similarity, while that of machine learning is also based on the sequence, but without the consideration of their similarity. This unique characteristic of machine learning makes it a good complement to BLAST and many other approaches in predicting the function of remotely relevant proteins and the homologous proteins of distinct function. However, the identification accuracies of these in silico methods and their false discovery rate have not yet been assessed so far, which greatly limits the usage of these algorithms. Herein, a comprehensive comparison of the performances among four popular prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these methods was systematically assessed by four standard statistical indexes based on the independent test datasets of 93 functional protein families defined by UniProtKB keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model organisms (Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Mycobacterium tuberculosis). As a result, the substantially higher sensitivity of SVM and BLAST was observed compared with that of PNN and KNN. However, the machine learning algorithms (PNN, KNN and SVM) were found capable of substantially reducing the false discovery rate (SVM < PNN < KNN). In sum, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research. MDPI 2018-01-08 /pmc/articles/PMC5796132/ /pubmed/29316706 http://dx.doi.org/10.3390/ijms19010183 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yu, Chun Yan
Li, Xiao Xu
Yang, Hong
Li, Ying Hong
Xue, Wei Wei
Chen, Yu Zong
Tao, Lin
Zhu, Feng
Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate
title Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate
title_full Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate
title_fullStr Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate
title_full_unstemmed Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate
title_short Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate
title_sort assessing the performances of protein function prediction algorithms from the perspectives of identification accuracy and false discovery rate
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5796132/
https://www.ncbi.nlm.nih.gov/pubmed/29316706
http://dx.doi.org/10.3390/ijms19010183
work_keys_str_mv AT yuchunyan assessingtheperformancesofproteinfunctionpredictionalgorithmsfromtheperspectivesofidentificationaccuracyandfalsediscoveryrate
AT lixiaoxu assessingtheperformancesofproteinfunctionpredictionalgorithmsfromtheperspectivesofidentificationaccuracyandfalsediscoveryrate
AT yanghong assessingtheperformancesofproteinfunctionpredictionalgorithmsfromtheperspectivesofidentificationaccuracyandfalsediscoveryrate
AT liyinghong assessingtheperformancesofproteinfunctionpredictionalgorithmsfromtheperspectivesofidentificationaccuracyandfalsediscoveryrate
AT xueweiwei assessingtheperformancesofproteinfunctionpredictionalgorithmsfromtheperspectivesofidentificationaccuracyandfalsediscoveryrate
AT chenyuzong assessingtheperformancesofproteinfunctionpredictionalgorithmsfromtheperspectivesofidentificationaccuracyandfalsediscoveryrate
AT taolin assessingtheperformancesofproteinfunctionpredictionalgorithmsfromtheperspectivesofidentificationaccuracyandfalsediscoveryrate
AT zhufeng assessingtheperformancesofproteinfunctionpredictionalgorithmsfromtheperspectivesofidentificationaccuracyandfalsediscoveryrate