Cargando…

An improved approach to infer protein-protein interaction based on a hierarchical vector space model

BACKGROUND: Comparing and classifying functions of gene products are important in today’s biomedical research. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most widely used indicators for protein interaction. Among the various approaches prop...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jiongmin, Jia, Ke, Jia, Jinmeng, Qian, Ying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5921294/
https://www.ncbi.nlm.nih.gov/pubmed/29699476
http://dx.doi.org/10.1186/s12859-018-2152-z
_version_ 1783317977730383872
author Zhang, Jiongmin
Jia, Ke
Jia, Jinmeng
Qian, Ying
author_facet Zhang, Jiongmin
Jia, Ke
Jia, Jinmeng
Qian, Ying
author_sort Zhang, Jiongmin
collection PubMed
description BACKGROUND: Comparing and classifying functions of gene products are important in today’s biomedical research. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most widely used indicators for protein interaction. Among the various approaches proposed, those based on the vector space model are relatively simple, but their effectiveness is far from satisfying. RESULTS: We propose a Hierarchical Vector Space Model (HVSM) for computing semantic similarity between different genes or their products, which enhances the basic vector space model by introducing the relation between GO terms. Besides the directly annotated terms, HVSM also takes their ancestors and descendants related by “is_a” and “part_of” relations into account. Moreover, HVSM introduces the concept of a Certainty Factor to calibrate the semantic similarity based on the number of terms annotated to genes. To assess the performance of our method, we applied HVSM to Homo sapiens and Saccharomyces cerevisiae protein-protein interaction datasets. Compared with TCSS, Resnik, and other classic similarity measures, HVSM achieved significant improvement for distinguishing positive from negative protein interactions. We also tested its correlation with sequence, EC, and Pfam similarity using online tool CESSM. CONCLUSIONS: HVSM showed an improvement of up to 4% compared to TCSS, 8% compared to IntelliGO, 12% compared to basic VSM, 6% compared to Resnik, 8% compared to Lin, 11% compared to Jiang, 8% compared to Schlicker, and 11% compared to SimGIC using AUC scores. CESSM test showed HVSM was comparable to SimGIC, and superior to all other similarity measures in CESSM as well as TCSS. Supplementary information and the software are available at https://github.com/kejia1215/HVSM.
format Online
Article
Text
id pubmed-5921294
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59212942018-05-01 An improved approach to infer protein-protein interaction based on a hierarchical vector space model Zhang, Jiongmin Jia, Ke Jia, Jinmeng Qian, Ying BMC Bioinformatics Methodology Article BACKGROUND: Comparing and classifying functions of gene products are important in today’s biomedical research. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most widely used indicators for protein interaction. Among the various approaches proposed, those based on the vector space model are relatively simple, but their effectiveness is far from satisfying. RESULTS: We propose a Hierarchical Vector Space Model (HVSM) for computing semantic similarity between different genes or their products, which enhances the basic vector space model by introducing the relation between GO terms. Besides the directly annotated terms, HVSM also takes their ancestors and descendants related by “is_a” and “part_of” relations into account. Moreover, HVSM introduces the concept of a Certainty Factor to calibrate the semantic similarity based on the number of terms annotated to genes. To assess the performance of our method, we applied HVSM to Homo sapiens and Saccharomyces cerevisiae protein-protein interaction datasets. Compared with TCSS, Resnik, and other classic similarity measures, HVSM achieved significant improvement for distinguishing positive from negative protein interactions. We also tested its correlation with sequence, EC, and Pfam similarity using online tool CESSM. CONCLUSIONS: HVSM showed an improvement of up to 4% compared to TCSS, 8% compared to IntelliGO, 12% compared to basic VSM, 6% compared to Resnik, 8% compared to Lin, 11% compared to Jiang, 8% compared to Schlicker, and 11% compared to SimGIC using AUC scores. CESSM test showed HVSM was comparable to SimGIC, and superior to all other similarity measures in CESSM as well as TCSS. Supplementary information and the software are available at https://github.com/kejia1215/HVSM. BioMed Central 2018-04-27 /pmc/articles/PMC5921294/ /pubmed/29699476 http://dx.doi.org/10.1186/s12859-018-2152-z Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Zhang, Jiongmin
Jia, Ke
Jia, Jinmeng
Qian, Ying
An improved approach to infer protein-protein interaction based on a hierarchical vector space model
title An improved approach to infer protein-protein interaction based on a hierarchical vector space model
title_full An improved approach to infer protein-protein interaction based on a hierarchical vector space model
title_fullStr An improved approach to infer protein-protein interaction based on a hierarchical vector space model
title_full_unstemmed An improved approach to infer protein-protein interaction based on a hierarchical vector space model
title_short An improved approach to infer protein-protein interaction based on a hierarchical vector space model
title_sort improved approach to infer protein-protein interaction based on a hierarchical vector space model
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5921294/
https://www.ncbi.nlm.nih.gov/pubmed/29699476
http://dx.doi.org/10.1186/s12859-018-2152-z
work_keys_str_mv AT zhangjiongmin animprovedapproachtoinferproteinproteininteractionbasedonahierarchicalvectorspacemodel
AT jiake animprovedapproachtoinferproteinproteininteractionbasedonahierarchicalvectorspacemodel
AT jiajinmeng animprovedapproachtoinferproteinproteininteractionbasedonahierarchicalvectorspacemodel
AT qianying animprovedapproachtoinferproteinproteininteractionbasedonahierarchicalvectorspacemodel
AT zhangjiongmin improvedapproachtoinferproteinproteininteractionbasedonahierarchicalvectorspacemodel
AT jiake improvedapproachtoinferproteinproteininteractionbasedonahierarchicalvectorspacemodel
AT jiajinmeng improvedapproachtoinferproteinproteininteractionbasedonahierarchicalvectorspacemodel
AT qianying improvedapproachtoinferproteinproteininteractionbasedonahierarchicalvectorspacemodel