Cargando…
Minimalist ensemble algorithms for genome-wide protein localization prediction
BACKGROUND: Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to i...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426488/ https://www.ncbi.nlm.nih.gov/pubmed/22759391 http://dx.doi.org/10.1186/1471-2105-13-157 |
_version_ | 1782241515005804544 |
---|---|
author | Lin, Jhih-Rong Mondal, Ananda Mohan Liu, Rong Hu, Jianjun |
author_facet | Lin, Jhih-Rong Mondal, Ananda Mohan Liu, Rong Hu, Jianjun |
author_sort | Lin, Jhih-Rong |
collection | PubMed |
description | BACKGROUND: Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. RESULTS: This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. CONCLUSIONS: We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. |
format | Online Article Text |
id | pubmed-3426488 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34264882012-08-24 Minimalist ensemble algorithms for genome-wide protein localization prediction Lin, Jhih-Rong Mondal, Ananda Mohan Liu, Rong Hu, Jianjun BMC Bioinformatics Methodology Article BACKGROUND: Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. RESULTS: This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. CONCLUSIONS: We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. BioMed Central 2012-07-03 /pmc/articles/PMC3426488/ /pubmed/22759391 http://dx.doi.org/10.1186/1471-2105-13-157 Text en Copyright ©2012 Lin et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Lin, Jhih-Rong Mondal, Ananda Mohan Liu, Rong Hu, Jianjun Minimalist ensemble algorithms for genome-wide protein localization prediction |
title | Minimalist ensemble algorithms for genome-wide protein localization prediction |
title_full | Minimalist ensemble algorithms for genome-wide protein localization prediction |
title_fullStr | Minimalist ensemble algorithms for genome-wide protein localization prediction |
title_full_unstemmed | Minimalist ensemble algorithms for genome-wide protein localization prediction |
title_short | Minimalist ensemble algorithms for genome-wide protein localization prediction |
title_sort | minimalist ensemble algorithms for genome-wide protein localization prediction |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426488/ https://www.ncbi.nlm.nih.gov/pubmed/22759391 http://dx.doi.org/10.1186/1471-2105-13-157 |
work_keys_str_mv | AT linjhihrong minimalistensemblealgorithmsforgenomewideproteinlocalizationprediction AT mondalanandamohan minimalistensemblealgorithmsforgenomewideproteinlocalizationprediction AT liurong minimalistensemblealgorithmsforgenomewideproteinlocalizationprediction AT hujianjun minimalistensemblealgorithmsforgenomewideproteinlocalizationprediction |