Cargando…

IntelliGO: a new vector-based semantic similarity measure including annotation origin

BACKGROUND: The Gene Ontology (GO) is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarit...

Descripción completa

Detalles Bibliográficos
Autores principales: Benabderrahmane, Sidahmed, Smail-Tabbone, Malika, Poch, Olivier, Napoli, Amedeo, Devignes, Marie-Dominique
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098105/
https://www.ncbi.nlm.nih.gov/pubmed/21122125
http://dx.doi.org/10.1186/1471-2105-11-588
_version_ 1782203918711783424
author Benabderrahmane, Sidahmed
Smail-Tabbone, Malika
Poch, Olivier
Napoli, Amedeo
Devignes, Marie-Dominique
author_facet Benabderrahmane, Sidahmed
Smail-Tabbone, Malika
Poch, Olivier
Napoli, Amedeo
Devignes, Marie-Dominique
author_sort Benabderrahmane, Sidahmed
collection PubMed
description BACKGROUND: The Gene Ontology (GO) is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarity. These measures generally involve the GO graph structure, the information content of GO aspects, or a combination of both. However, only a few of the semantic similarity measures described so far can handle GO annotations differently according to their origin (i.e. their evidence codes). RESULTS: We present here a new semantic similarity measure called IntelliGO which integrates several complementary properties in a novel vector space model. The coefficients associated with each GO term that annotates a given gene or protein include its information content as well as a customized value for each type of GO evidence code. The generalized cosine similarity measure, used for calculating the dot product between two vectors, has been rigorously adapted to the context of the GO graph. The IntelliGO similarity measure is tested on two benchmark datasets consisting of KEGG pathways and Pfam domains grouped as clans, considering the GO biological process and molecular function terms, respectively, for a total of 683 yeast and human genes and involving more than 67,900 pair-wise comparisons. The ability of the IntelliGO similarity measure to express the biological cohesion of sets of genes compares favourably to four existing similarity measures. For inter-set comparison, it consistently discriminates between distinct sets of genes. Furthermore, the IntelliGO similarity measure allows the influence of weights assigned to evidence codes to be checked. Finally, the results obtained with a complementary reference technique give intermediate but correct correlation values with the sequence similarity, Pfam, and Enzyme classifications when compared to previously published measures. CONCLUSIONS: The IntelliGO similarity measure provides a customizable and comprehensive method for quantifying gene similarity based on GO annotations. It also displays a robust set-discriminating power which suggests it will be useful for functional clustering. AVAILABILITY: An on-line version of the IntelliGO similarity measure is available at: http://bioinfo.loria.fr/Members/benabdsi/intelligo_project/
format Text
id pubmed-3098105
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30981052011-09-28 IntelliGO: a new vector-based semantic similarity measure including annotation origin Benabderrahmane, Sidahmed Smail-Tabbone, Malika Poch, Olivier Napoli, Amedeo Devignes, Marie-Dominique BMC Bioinformatics Methodology Article BACKGROUND: The Gene Ontology (GO) is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarity. These measures generally involve the GO graph structure, the information content of GO aspects, or a combination of both. However, only a few of the semantic similarity measures described so far can handle GO annotations differently according to their origin (i.e. their evidence codes). RESULTS: We present here a new semantic similarity measure called IntelliGO which integrates several complementary properties in a novel vector space model. The coefficients associated with each GO term that annotates a given gene or protein include its information content as well as a customized value for each type of GO evidence code. The generalized cosine similarity measure, used for calculating the dot product between two vectors, has been rigorously adapted to the context of the GO graph. The IntelliGO similarity measure is tested on two benchmark datasets consisting of KEGG pathways and Pfam domains grouped as clans, considering the GO biological process and molecular function terms, respectively, for a total of 683 yeast and human genes and involving more than 67,900 pair-wise comparisons. The ability of the IntelliGO similarity measure to express the biological cohesion of sets of genes compares favourably to four existing similarity measures. For inter-set comparison, it consistently discriminates between distinct sets of genes. Furthermore, the IntelliGO similarity measure allows the influence of weights assigned to evidence codes to be checked. Finally, the results obtained with a complementary reference technique give intermediate but correct correlation values with the sequence similarity, Pfam, and Enzyme classifications when compared to previously published measures. CONCLUSIONS: The IntelliGO similarity measure provides a customizable and comprehensive method for quantifying gene similarity based on GO annotations. It also displays a robust set-discriminating power which suggests it will be useful for functional clustering. AVAILABILITY: An on-line version of the IntelliGO similarity measure is available at: http://bioinfo.loria.fr/Members/benabdsi/intelligo_project/ BioMed Central 2010-12-01 /pmc/articles/PMC3098105/ /pubmed/21122125 http://dx.doi.org/10.1186/1471-2105-11-588 Text en Copyright ©2010 Benabderrahmane et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Benabderrahmane, Sidahmed
Smail-Tabbone, Malika
Poch, Olivier
Napoli, Amedeo
Devignes, Marie-Dominique
IntelliGO: a new vector-based semantic similarity measure including annotation origin
title IntelliGO: a new vector-based semantic similarity measure including annotation origin
title_full IntelliGO: a new vector-based semantic similarity measure including annotation origin
title_fullStr IntelliGO: a new vector-based semantic similarity measure including annotation origin
title_full_unstemmed IntelliGO: a new vector-based semantic similarity measure including annotation origin
title_short IntelliGO: a new vector-based semantic similarity measure including annotation origin
title_sort intelligo: a new vector-based semantic similarity measure including annotation origin
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098105/
https://www.ncbi.nlm.nih.gov/pubmed/21122125
http://dx.doi.org/10.1186/1471-2105-11-588
work_keys_str_mv AT benabderrahmanesidahmed intelligoanewvectorbasedsemanticsimilaritymeasureincludingannotationorigin
AT smailtabbonemalika intelligoanewvectorbasedsemanticsimilaritymeasureincludingannotationorigin
AT pocholivier intelligoanewvectorbasedsemanticsimilaritymeasureincludingannotationorigin
AT napoliamedeo intelligoanewvectorbasedsemanticsimilaritymeasureincludingannotationorigin
AT devignesmariedominique intelligoanewvectorbasedsemanticsimilaritymeasureincludingannotationorigin