Cargando…

A close look at protein function prediction evaluation protocols

BACKGROUND: The recently held Critical Assessment of Function Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not. This is in contrast to the original CAFA challenge in which pa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kahanda, Indika, Funk, Christopher S, Ullah, Fahad, Verspoor, Karin M, Ben-Hur, Asa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570743/ https://www.ncbi.nlm.nih.gov/pubmed/26380075 http://dx.doi.org/10.1186/s13742-015-0082-5

_version_	1782390254395719680
author	Kahanda, Indika Funk, Christopher S Ullah, Fahad Verspoor, Karin M Ben-Hur, Asa
author_facet	Kahanda, Indika Funk, Christopher S Ullah, Fahad Verspoor, Karin M Ben-Hur, Asa
author_sort	Kahanda, Indika
collection	PubMed
description	BACKGROUND: The recently held Critical Assessment of Function Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not. This is in contrast to the original CAFA challenge in which participants were asked to submit predictions for proteins with no existing annotations. The CAFA2 task is more realistic, in that it more closely mimics the accumulation of annotations over time. In this study we compare these tasks in terms of their difficulty, and determine whether cross-validation provides a good estimate of performance. RESULTS: The CAFA2 task is a combination of two subtasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In this study we analyze the performance of several function prediction methods in these two scenarios. Our results show that several methods (structured support vector machine, binary support vector machines and guilt-by-association methods) do not usually achieve the same level of accuracy on these two tasks as that achieved by cross-validation, and that predicting novel annotations for previously annotated proteins is a harder problem than predicting annotations for uncharacterized proteins. We also find that different methods have different performance characteristics in these tasks, and that cross-validation is not adequate at estimating performance and ranking methods. CONCLUSIONS: These results have implications for the design of computational experiments in the area of automated function prediction and can provide useful insight for the understanding and design of future CAFA competitions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0082-5) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4570743
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-45707432015-09-16 A close look at protein function prediction evaluation protocols Kahanda, Indika Funk, Christopher S Ullah, Fahad Verspoor, Karin M Ben-Hur, Asa Gigascience Research BACKGROUND: The recently held Critical Assessment of Function Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not. This is in contrast to the original CAFA challenge in which participants were asked to submit predictions for proteins with no existing annotations. The CAFA2 task is more realistic, in that it more closely mimics the accumulation of annotations over time. In this study we compare these tasks in terms of their difficulty, and determine whether cross-validation provides a good estimate of performance. RESULTS: The CAFA2 task is a combination of two subtasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In this study we analyze the performance of several function prediction methods in these two scenarios. Our results show that several methods (structured support vector machine, binary support vector machines and guilt-by-association methods) do not usually achieve the same level of accuracy on these two tasks as that achieved by cross-validation, and that predicting novel annotations for previously annotated proteins is a harder problem than predicting annotations for uncharacterized proteins. We also find that different methods have different performance characteristics in these tasks, and that cross-validation is not adequate at estimating performance and ranking methods. CONCLUSIONS: These results have implications for the design of computational experiments in the area of automated function prediction and can provide useful insight for the understanding and design of future CAFA competitions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0082-5) contains supplementary material, which is available to authorized users. BioMed Central 2015-09-14 /pmc/articles/PMC4570743/ /pubmed/26380075 http://dx.doi.org/10.1186/s13742-015-0082-5 Text en © Kahanda et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Kahanda, Indika Funk, Christopher S Ullah, Fahad Verspoor, Karin M Ben-Hur, Asa A close look at protein function prediction evaluation protocols
title	A close look at protein function prediction evaluation protocols
title_full	A close look at protein function prediction evaluation protocols
title_fullStr	A close look at protein function prediction evaluation protocols
title_full_unstemmed	A close look at protein function prediction evaluation protocols
title_short	A close look at protein function prediction evaluation protocols
title_sort	close look at protein function prediction evaluation protocols
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570743/ https://www.ncbi.nlm.nih.gov/pubmed/26380075 http://dx.doi.org/10.1186/s13742-015-0082-5
work_keys_str_mv	AT kahandaindika acloselookatproteinfunctionpredictionevaluationprotocols AT funkchristophers acloselookatproteinfunctionpredictionevaluationprotocols AT ullahfahad acloselookatproteinfunctionpredictionevaluationprotocols AT verspoorkarinm acloselookatproteinfunctionpredictionevaluationprotocols AT benhurasa acloselookatproteinfunctionpredictionevaluationprotocols AT kahandaindika closelookatproteinfunctionpredictionevaluationprotocols AT funkchristophers closelookatproteinfunctionpredictionevaluationprotocols AT ullahfahad closelookatproteinfunctionpredictionevaluationprotocols AT verspoorkarinm closelookatproteinfunctionpredictionevaluationprotocols AT benhurasa closelookatproteinfunctionpredictionevaluationprotocols

A close look at protein function prediction evaluation protocols

Ejemplares similares