Cargando…

Large-scale protein function prediction using heterogeneous ensembles

Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall go...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Linhua, Law, Jeffrey, Kale, Shiv D., Murali, T. M., Pandey, Gaurav
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6221071/
https://www.ncbi.nlm.nih.gov/pubmed/30450194
http://dx.doi.org/10.12688/f1000research.16415.1
_version_ 1783368950770302976
author Wang, Linhua
Law, Jeffrey
Kale, Shiv D.
Murali, T. M.
Pandey, Gaurav
author_facet Wang, Linhua
Law, Jeffrey
Kale, Shiv D.
Murali, T. M.
Pandey, Gaurav
author_sort Wang, Linhua
collection PubMed
description Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred ( https://github.com/GauravPandeyLab/LargeGOPred).
format Online
Article
Text
id pubmed-6221071
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-62210712018-11-15 Large-scale protein function prediction using heterogeneous ensembles Wang, Linhua Law, Jeffrey Kale, Shiv D. Murali, T. M. Pandey, Gaurav F1000Res Method Article Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred ( https://github.com/GauravPandeyLab/LargeGOPred). F1000 Research Limited 2018-09-28 /pmc/articles/PMC6221071/ /pubmed/30450194 http://dx.doi.org/10.12688/f1000research.16415.1 Text en Copyright: © 2018 Wang L et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Method Article
Wang, Linhua
Law, Jeffrey
Kale, Shiv D.
Murali, T. M.
Pandey, Gaurav
Large-scale protein function prediction using heterogeneous ensembles
title Large-scale protein function prediction using heterogeneous ensembles
title_full Large-scale protein function prediction using heterogeneous ensembles
title_fullStr Large-scale protein function prediction using heterogeneous ensembles
title_full_unstemmed Large-scale protein function prediction using heterogeneous ensembles
title_short Large-scale protein function prediction using heterogeneous ensembles
title_sort large-scale protein function prediction using heterogeneous ensembles
topic Method Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6221071/
https://www.ncbi.nlm.nih.gov/pubmed/30450194
http://dx.doi.org/10.12688/f1000research.16415.1
work_keys_str_mv AT wanglinhua largescaleproteinfunctionpredictionusingheterogeneousensembles
AT lawjeffrey largescaleproteinfunctionpredictionusingheterogeneousensembles
AT kaleshivd largescaleproteinfunctionpredictionusingheterogeneousensembles
AT muralitm largescaleproteinfunctionpredictionusingheterogeneousensembles
AT pandeygaurav largescaleproteinfunctionpredictionusingheterogeneousensembles