Cargando…

The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective

Motivation: The automated functional annotation of biological macromolecules is a problem of computational assignment of biological concepts or ontological terms to genes and gene products. A number of methods have been developed to computationally annotate genes using standardized nomenclature such...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jiang, Yuxiang, Clark, Wyatt T., Friedberg, Iddo, Radivojac, Predrag
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2014
Materias:	Eccb 2014 Proceedings Papers Committee
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4147924/ https://www.ncbi.nlm.nih.gov/pubmed/25161254 http://dx.doi.org/10.1093/bioinformatics/btu472

_version_	1782332539141095424
author	Jiang, Yuxiang Clark, Wyatt T. Friedberg, Iddo Radivojac, Predrag
author_facet	Jiang, Yuxiang Clark, Wyatt T. Friedberg, Iddo Radivojac, Predrag
author_sort	Jiang, Yuxiang
collection	PubMed
description	Motivation: The automated functional annotation of biological macromolecules is a problem of computational assignment of biological concepts or ontological terms to genes and gene products. A number of methods have been developed to computationally annotate genes using standardized nomenclature such as Gene Ontology (GO). However, questions remain about the possibility for development of accurate methods that can integrate disparate molecular data as well as about an unbiased evaluation of these methods. One important concern is that experimental annotations of proteins are incomplete. This raises questions as to whether and to what degree currently available data can be reliably used to train computational models and estimate their performance accuracy. Results: We study the effect of incomplete experimental annotations on the reliability of performance evaluation in protein function prediction. Using the structured-output learning framework, we provide theoretical analyses and carry out simulations to characterize the effect of growing experimental annotations on the correctness and stability of performance estimates corresponding to different types of methods. We then analyze real biological data by simulating the prediction, evaluation and subsequent re-evaluation (after additional experimental annotations become available) of GO term predictions. Our results agree with previous observations that incomplete and accumulating experimental annotations have the potential to significantly impact accuracy assessments. We find that their influence reflects a complex interplay between the prediction algorithm, performance metric and underlying ontology. However, using the available experimental data and under realistic assumptions, our results also suggest that current large-scale evaluations are meaningful and almost surprisingly reliable. Contact: predrag@indiana.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-4147924
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-41479242014-09-02 The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective Jiang, Yuxiang Clark, Wyatt T. Friedberg, Iddo Radivojac, Predrag Bioinformatics Eccb 2014 Proceedings Papers Committee Motivation: The automated functional annotation of biological macromolecules is a problem of computational assignment of biological concepts or ontological terms to genes and gene products. A number of methods have been developed to computationally annotate genes using standardized nomenclature such as Gene Ontology (GO). However, questions remain about the possibility for development of accurate methods that can integrate disparate molecular data as well as about an unbiased evaluation of these methods. One important concern is that experimental annotations of proteins are incomplete. This raises questions as to whether and to what degree currently available data can be reliably used to train computational models and estimate their performance accuracy. Results: We study the effect of incomplete experimental annotations on the reliability of performance evaluation in protein function prediction. Using the structured-output learning framework, we provide theoretical analyses and carry out simulations to characterize the effect of growing experimental annotations on the correctness and stability of performance estimates corresponding to different types of methods. We then analyze real biological data by simulating the prediction, evaluation and subsequent re-evaluation (after additional experimental annotations become available) of GO term predictions. Our results agree with previous observations that incomplete and accumulating experimental annotations have the potential to significantly impact accuracy assessments. We find that their influence reflects a complex interplay between the prediction algorithm, performance metric and underlying ontology. However, using the available experimental data and under realistic assumptions, our results also suggest that current large-scale evaluations are meaningful and almost surprisingly reliable. Contact: predrag@indiana.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-09-01 2014-08-22 /pmc/articles/PMC4147924/ /pubmed/25161254 http://dx.doi.org/10.1093/bioinformatics/btu472 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Eccb 2014 Proceedings Papers Committee Jiang, Yuxiang Clark, Wyatt T. Friedberg, Iddo Radivojac, Predrag The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective
title	The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective
title_full	The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective
title_fullStr	The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective
title_full_unstemmed	The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective
title_short	The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective
title_sort	impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective
topic	Eccb 2014 Proceedings Papers Committee
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4147924/ https://www.ncbi.nlm.nih.gov/pubmed/25161254 http://dx.doi.org/10.1093/bioinformatics/btu472
work_keys_str_mv	AT jiangyuxiang theimpactofincompleteknowledgeontheevaluationofproteinfunctionpredictionastructuredoutputlearningperspective AT clarkwyattt theimpactofincompleteknowledgeontheevaluationofproteinfunctionpredictionastructuredoutputlearningperspective AT friedbergiddo theimpactofincompleteknowledgeontheevaluationofproteinfunctionpredictionastructuredoutputlearningperspective AT radivojacpredrag theimpactofincompleteknowledgeontheevaluationofproteinfunctionpredictionastructuredoutputlearningperspective AT jiangyuxiang impactofincompleteknowledgeontheevaluationofproteinfunctionpredictionastructuredoutputlearningperspective AT clarkwyattt impactofincompleteknowledgeontheevaluationofproteinfunctionpredictionastructuredoutputlearningperspective AT friedbergiddo impactofincompleteknowledgeontheevaluationofproteinfunctionpredictionastructuredoutputlearningperspective AT radivojacpredrag impactofincompleteknowledgeontheevaluationofproteinfunctionpredictionastructuredoutputlearningperspective

The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective

Ejemplares similares