Cargando…

A shortest-path graph kernel for estimating gene product semantic similarity

BACKGROUND: Existing methods for calculating semantic similarity between gene products using the Gene Ontology (GO) often rely on external resources, which are not part of the ontology. Consequently, changes in these external resources like biased term distribution caused by shifting of hot research...

Descripción completa

Detalles Bibliográficos
Autores principales: Alvarez, Marco A, Qi, Xiaojun, Yan, Changhui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3161911/
https://www.ncbi.nlm.nih.gov/pubmed/21801410
http://dx.doi.org/10.1186/2041-1480-2-3
_version_ 1782210755441983488
author Alvarez, Marco A
Qi, Xiaojun
Yan, Changhui
author_facet Alvarez, Marco A
Qi, Xiaojun
Yan, Changhui
author_sort Alvarez, Marco A
collection PubMed
description BACKGROUND: Existing methods for calculating semantic similarity between gene products using the Gene Ontology (GO) often rely on external resources, which are not part of the ontology. Consequently, changes in these external resources like biased term distribution caused by shifting of hot research topics, will affect the calculation of semantic similarity. One way to avoid this problem is to use semantic methods that are "intrinsic" to the ontology, i.e. independent of external knowledge. RESULTS: We present a shortest-path graph kernel (spgk) method that relies exclusively on the GO and its structure. In spgk, a gene product is represented by an induced subgraph of the GO, which consists of all the GO terms annotating it. Then a shortest-path graph kernel is used to compute the similarity between two graphs. In a comprehensive evaluation using a benchmark dataset, spgk compares favorably with other methods that depend on external resources. Compared with simUI, a method that is also intrinsic to GO, spgk achieves slightly better results on the benchmark dataset. Statistical tests show that the improvement is significant when the resolution and EC similarity correlation coefficient are used to measure the performance, but is insignificant when the Pfam similarity correlation coefficient is used. CONCLUSIONS: Spgk uses a graph kernel method in polynomial time to exploit the structure of the GO to calculate semantic similarity between gene products. It provides an alternative to both methods that use external resources and "intrinsic" methods with comparable performance.
format Online
Article
Text
id pubmed-3161911
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31619112011-08-26 A shortest-path graph kernel for estimating gene product semantic similarity Alvarez, Marco A Qi, Xiaojun Yan, Changhui J Biomed Semantics Research BACKGROUND: Existing methods for calculating semantic similarity between gene products using the Gene Ontology (GO) often rely on external resources, which are not part of the ontology. Consequently, changes in these external resources like biased term distribution caused by shifting of hot research topics, will affect the calculation of semantic similarity. One way to avoid this problem is to use semantic methods that are "intrinsic" to the ontology, i.e. independent of external knowledge. RESULTS: We present a shortest-path graph kernel (spgk) method that relies exclusively on the GO and its structure. In spgk, a gene product is represented by an induced subgraph of the GO, which consists of all the GO terms annotating it. Then a shortest-path graph kernel is used to compute the similarity between two graphs. In a comprehensive evaluation using a benchmark dataset, spgk compares favorably with other methods that depend on external resources. Compared with simUI, a method that is also intrinsic to GO, spgk achieves slightly better results on the benchmark dataset. Statistical tests show that the improvement is significant when the resolution and EC similarity correlation coefficient are used to measure the performance, but is insignificant when the Pfam similarity correlation coefficient is used. CONCLUSIONS: Spgk uses a graph kernel method in polynomial time to exploit the structure of the GO to calculate semantic similarity between gene products. It provides an alternative to both methods that use external resources and "intrinsic" methods with comparable performance. BioMed Central 2011-07-29 /pmc/articles/PMC3161911/ /pubmed/21801410 http://dx.doi.org/10.1186/2041-1480-2-3 Text en Copyright ©2011 Alvarez et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Alvarez, Marco A
Qi, Xiaojun
Yan, Changhui
A shortest-path graph kernel for estimating gene product semantic similarity
title A shortest-path graph kernel for estimating gene product semantic similarity
title_full A shortest-path graph kernel for estimating gene product semantic similarity
title_fullStr A shortest-path graph kernel for estimating gene product semantic similarity
title_full_unstemmed A shortest-path graph kernel for estimating gene product semantic similarity
title_short A shortest-path graph kernel for estimating gene product semantic similarity
title_sort shortest-path graph kernel for estimating gene product semantic similarity
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3161911/
https://www.ncbi.nlm.nih.gov/pubmed/21801410
http://dx.doi.org/10.1186/2041-1480-2-3
work_keys_str_mv AT alvarezmarcoa ashortestpathgraphkernelforestimatinggeneproductsemanticsimilarity
AT qixiaojun ashortestpathgraphkernelforestimatinggeneproductsemanticsimilarity
AT yanchanghui ashortestpathgraphkernelforestimatinggeneproductsemanticsimilarity
AT alvarezmarcoa shortestpathgraphkernelforestimatinggeneproductsemanticsimilarity
AT qixiaojun shortestpathgraphkernelforestimatinggeneproductsemanticsimilarity
AT yanchanghui shortestpathgraphkernelforestimatinggeneproductsemanticsimilarity