Cargando…

An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology

BACKGROUND: Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs). They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO). Proteins that interact in the cell are likely to b...

Descripción completa

Detalles Bibliográficos
Autores principales: Jain, Shobhit, Bader, Gary D
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2998529/
https://www.ncbi.nlm.nih.gov/pubmed/21078182
http://dx.doi.org/10.1186/1471-2105-11-562
_version_ 1782193379217506304
author Jain, Shobhit
Bader, Gary D
author_facet Jain, Shobhit
Bader, Gary D
author_sort Jain, Shobhit
collection PubMed
description BACKGROUND: Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs). They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO). Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity. RESULTS: We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS), to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs. CONCLUSIONS: The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the F(1 )score over Resnik, the next best method, on our Saccharomyces cerevisiae PPI dataset and 2 times on our Homo sapiens PPI dataset using cellular component, biological process and molecular function GO annotations.
format Text
id pubmed-2998529
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29985292011-01-05 An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology Jain, Shobhit Bader, Gary D BMC Bioinformatics Methodology Article BACKGROUND: Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs). They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO). Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity. RESULTS: We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS), to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs. CONCLUSIONS: The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the F(1 )score over Resnik, the next best method, on our Saccharomyces cerevisiae PPI dataset and 2 times on our Homo sapiens PPI dataset using cellular component, biological process and molecular function GO annotations. BioMed Central 2010-11-15 /pmc/articles/PMC2998529/ /pubmed/21078182 http://dx.doi.org/10.1186/1471-2105-11-562 Text en Copyright ©2010 Jain and Bader; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Jain, Shobhit
Bader, Gary D
An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology
title An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology
title_full An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology
title_fullStr An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology
title_full_unstemmed An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology
title_short An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology
title_sort improved method for scoring protein-protein interactions using semantic similarity within the gene ontology
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2998529/
https://www.ncbi.nlm.nih.gov/pubmed/21078182
http://dx.doi.org/10.1186/1471-2105-11-562
work_keys_str_mv AT jainshobhit animprovedmethodforscoringproteinproteininteractionsusingsemanticsimilaritywithinthegeneontology
AT badergaryd animprovedmethodforscoringproteinproteininteractionsusingsemanticsimilaritywithinthegeneontology
AT jainshobhit improvedmethodforscoringproteinproteininteractionsusingsemanticsimilaritywithinthegeneontology
AT badergaryd improvedmethodforscoringproteinproteininteractionsusingsemanticsimilaritywithinthegeneontology