Cargando…

Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences

Automated protein annotation using the Gene Ontology (GO) plays an important role in the biosciences. Evaluation has always been considered central to developing novel annotation methods, but little attention has been paid to the evaluation metrics themselves. Evaluation metrics define how well an a...

Descripción completa

Detalles Bibliográficos
Autores principales: Plyusnin, Ilya, Holm, Liisa, Törönen, Petri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855565/
https://www.ncbi.nlm.nih.gov/pubmed/31682632
http://dx.doi.org/10.1371/journal.pcbi.1007419
_version_ 1783470427875573760
author Plyusnin, Ilya
Holm, Liisa
Törönen, Petri
author_facet Plyusnin, Ilya
Holm, Liisa
Törönen, Petri
author_sort Plyusnin, Ilya
collection PubMed
description Automated protein annotation using the Gene Ontology (GO) plays an important role in the biosciences. Evaluation has always been considered central to developing novel annotation methods, but little attention has been paid to the evaluation metrics themselves. Evaluation metrics define how well an annotation method performs and allows for them to be ranked against one another. Unfortunately, most of these metrics were adopted from the machine learning literature without establishing whether they were appropriate for GO annotations. We propose a novel approach for comparing GO evaluation metrics called Artificial Dilution Series (ADS). Our approach uses existing annotation data to generate a series of annotation sets with different levels of correctness (referred to as their signal level). We calculate the evaluation metric being tested for each annotation set in the series, allowing us to identify whether it can separate different signal levels. Finally, we contrast these results with several false positive annotation sets, which are designed to expose systematic weaknesses in GO assessment. We compared 37 evaluation metrics for GO annotation using ADS and identified drastic differences between metrics. We show that some metrics struggle to differentiate between different signal levels, while others give erroneously high scores to the false positive data sets. Based on our findings, we provide guidelines on which evaluation metrics perform well with the Gene Ontology and propose improvements to several well-known evaluation metrics. In general, we argue that evaluation metrics should be tested for their performance and we provide software for this purpose (https://bitbucket.org/plyusnin/ads/). ADS is applicable to other areas of science where the evaluation of prediction results is non-trivial.
format Online
Article
Text
id pubmed-6855565
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-68555652019-12-06 Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences Plyusnin, Ilya Holm, Liisa Törönen, Petri PLoS Comput Biol Research Article Automated protein annotation using the Gene Ontology (GO) plays an important role in the biosciences. Evaluation has always been considered central to developing novel annotation methods, but little attention has been paid to the evaluation metrics themselves. Evaluation metrics define how well an annotation method performs and allows for them to be ranked against one another. Unfortunately, most of these metrics were adopted from the machine learning literature without establishing whether they were appropriate for GO annotations. We propose a novel approach for comparing GO evaluation metrics called Artificial Dilution Series (ADS). Our approach uses existing annotation data to generate a series of annotation sets with different levels of correctness (referred to as their signal level). We calculate the evaluation metric being tested for each annotation set in the series, allowing us to identify whether it can separate different signal levels. Finally, we contrast these results with several false positive annotation sets, which are designed to expose systematic weaknesses in GO assessment. We compared 37 evaluation metrics for GO annotation using ADS and identified drastic differences between metrics. We show that some metrics struggle to differentiate between different signal levels, while others give erroneously high scores to the false positive data sets. Based on our findings, we provide guidelines on which evaluation metrics perform well with the Gene Ontology and propose improvements to several well-known evaluation metrics. In general, we argue that evaluation metrics should be tested for their performance and we provide software for this purpose (https://bitbucket.org/plyusnin/ads/). ADS is applicable to other areas of science where the evaluation of prediction results is non-trivial. Public Library of Science 2019-11-04 /pmc/articles/PMC6855565/ /pubmed/31682632 http://dx.doi.org/10.1371/journal.pcbi.1007419 Text en © 2019 Plyusnin et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Plyusnin, Ilya
Holm, Liisa
Törönen, Petri
Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences
title Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences
title_full Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences
title_fullStr Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences
title_full_unstemmed Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences
title_short Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences
title_sort novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855565/
https://www.ncbi.nlm.nih.gov/pubmed/31682632
http://dx.doi.org/10.1371/journal.pcbi.1007419
work_keys_str_mv AT plyusninilya novelcomparisonofevaluationmetricsforgeneontologyclassifiersrevealsdrasticperformancedifferences
AT holmliisa novelcomparisonofevaluationmetricsforgeneontologyclassifiersrevealsdrasticperformancedifferences
AT toronenpetri novelcomparisonofevaluationmetricsforgeneontologyclassifiersrevealsdrasticperformancedifferences