Cargando…

Metrics for GO based protein semantic similarity: a systematic evaluation

BACKGROUND: Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive eval...

Descripción completa

Detalles Bibliográficos
Autores principales: Pesquita, Catia, Faria, Daniel, Bastos, Hugo, Ferreira, António EN, Falcão, André O, Couto, Francisco M
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367622/
https://www.ncbi.nlm.nih.gov/pubmed/18460186
http://dx.doi.org/10.1186/1471-2105-9-S5-S4
_version_ 1782154335995559936
author Pesquita, Catia
Faria, Daniel
Bastos, Hugo
Ferreira, António EN
Falcão, André O
Couto, Francisco M
author_facet Pesquita, Catia
Faria, Daniel
Bastos, Hugo
Ferreira, António EN
Falcão, André O
Couto, Francisco M
author_sort Pesquita, Catia
collection PubMed
description BACKGROUND: Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations. RESULTS: We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation. CONCLUSIONS: This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid simGIC was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.
format Text
id pubmed-2367622
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23676222008-05-07 Metrics for GO based protein semantic similarity: a systematic evaluation Pesquita, Catia Faria, Daniel Bastos, Hugo Ferreira, António EN Falcão, André O Couto, Francisco M BMC Bioinformatics Proceedings BACKGROUND: Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations. RESULTS: We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation. CONCLUSIONS: This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid simGIC was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity. BioMed Central 2008-04-29 /pmc/articles/PMC2367622/ /pubmed/18460186 http://dx.doi.org/10.1186/1471-2105-9-S5-S4 Text en Copyright © 2008 Pesquita et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Pesquita, Catia
Faria, Daniel
Bastos, Hugo
Ferreira, António EN
Falcão, André O
Couto, Francisco M
Metrics for GO based protein semantic similarity: a systematic evaluation
title Metrics for GO based protein semantic similarity: a systematic evaluation
title_full Metrics for GO based protein semantic similarity: a systematic evaluation
title_fullStr Metrics for GO based protein semantic similarity: a systematic evaluation
title_full_unstemmed Metrics for GO based protein semantic similarity: a systematic evaluation
title_short Metrics for GO based protein semantic similarity: a systematic evaluation
title_sort metrics for go based protein semantic similarity: a systematic evaluation
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367622/
https://www.ncbi.nlm.nih.gov/pubmed/18460186
http://dx.doi.org/10.1186/1471-2105-9-S5-S4
work_keys_str_mv AT pesquitacatia metricsforgobasedproteinsemanticsimilarityasystematicevaluation
AT fariadaniel metricsforgobasedproteinsemanticsimilarityasystematicevaluation
AT bastoshugo metricsforgobasedproteinsemanticsimilarityasystematicevaluation
AT ferreiraantonioen metricsforgobasedproteinsemanticsimilarityasystematicevaluation
AT falcaoandreo metricsforgobasedproteinsemanticsimilarityasystematicevaluation
AT coutofranciscom metricsforgobasedproteinsemanticsimilarityasystematicevaluation