Cargando…

Negative Example Selection for Protein Function Prediction: The NoGO Database

Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative...

Descripción completa

Detalles Bibliográficos
Autores principales:	Youngs, Noah, Penfold-Brown, Duncan, Bonneau, Richard, Shasha, Dennis
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4055410/ https://www.ncbi.nlm.nih.gov/pubmed/24922051 http://dx.doi.org/10.1371/journal.pcbi.1003644

_version_	1782320654383579136
author	Youngs, Noah Penfold-Brown, Duncan Bonneau, Richard Shasha, Dennis
author_facet	Youngs, Noah Penfold-Brown, Duncan Bonneau, Richard Shasha, Dennis
author_sort	Youngs, Noah
collection	PubMed
description	Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).
format	Online Article Text
id	pubmed-4055410
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-40554102014-06-18 Negative Example Selection for Protein Function Prediction: The NoGO Database Youngs, Noah Penfold-Brown, Duncan Bonneau, Richard Shasha, Dennis PLoS Comput Biol Research Article Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html). Public Library of Science 2014-06-12 /pmc/articles/PMC4055410/ /pubmed/24922051 http://dx.doi.org/10.1371/journal.pcbi.1003644 Text en © 2014 Youngs et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Youngs, Noah Penfold-Brown, Duncan Bonneau, Richard Shasha, Dennis Negative Example Selection for Protein Function Prediction: The NoGO Database
title	Negative Example Selection for Protein Function Prediction: The NoGO Database
title_full	Negative Example Selection for Protein Function Prediction: The NoGO Database
title_fullStr	Negative Example Selection for Protein Function Prediction: The NoGO Database
title_full_unstemmed	Negative Example Selection for Protein Function Prediction: The NoGO Database
title_short	Negative Example Selection for Protein Function Prediction: The NoGO Database
title_sort	negative example selection for protein function prediction: the nogo database
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4055410/ https://www.ncbi.nlm.nih.gov/pubmed/24922051 http://dx.doi.org/10.1371/journal.pcbi.1003644
work_keys_str_mv	AT youngsnoah negativeexampleselectionforproteinfunctionpredictionthenogodatabase AT penfoldbrownduncan negativeexampleselectionforproteinfunctionpredictionthenogodatabase AT bonneaurichard negativeexampleselectionforproteinfunctionpredictionthenogodatabase AT shashadennis negativeexampleselectionforproteinfunctionpredictionthenogodatabase

Negative Example Selection for Protein Function Prediction: The NoGO Database

Ejemplares similares