Cargando…

GOntoSim: a semantic similarity measure based on LCA and common descendants

The Gene Ontology (GO) is a controlled vocabulary that captures the semantics or context of an entity based on its functional role. Biomedical entities are frequently compared to each other to find similarities to help in data annotation and knowledge transfer. In this study, we propose GOntoSim, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Kamran, Amna Binte, Naveed, Hammad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8907294/
https://www.ncbi.nlm.nih.gov/pubmed/35264663
http://dx.doi.org/10.1038/s41598-022-07624-3
_version_ 1784665609717415936
author Kamran, Amna Binte
Naveed, Hammad
author_facet Kamran, Amna Binte
Naveed, Hammad
author_sort Kamran, Amna Binte
collection PubMed
description The Gene Ontology (GO) is a controlled vocabulary that captures the semantics or context of an entity based on its functional role. Biomedical entities are frequently compared to each other to find similarities to help in data annotation and knowledge transfer. In this study, we propose GOntoSim, a novel method to determine the functional similarity between genes. GOntoSim quantifies the similarity between pairs of GO terms, by taking the graph structure and the information content of nodes into consideration. Our measure quantifies the similarity between the ancestors of the GO terms accurately. It also takes into account the common children of the GO terms. GOntoSim is evaluated using the entire Enzyme Dataset containing 10,890 proteins and 97,544 GO annotations. The enzymes are clustered and compared with the Gold Standard EC numbers. At level 1 of the EC Numbers for Molecular Function, GOntoSim achieves a purity score of 0.75 as compared to 0.47 and 0.51 GOGO and Wang. GOntoSim can handle the noisy IEA annotations. We achieve a purity score of 0.94 in contrast to 0.48 for both GOGO and Wang at level 1 of the EC Numbers with IEA annotations. GOntoSim can be freely accessed at (http://www.cbrlab.org/GOntoSim.html).
format Online
Article
Text
id pubmed-8907294
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-89072942022-03-11 GOntoSim: a semantic similarity measure based on LCA and common descendants Kamran, Amna Binte Naveed, Hammad Sci Rep Article The Gene Ontology (GO) is a controlled vocabulary that captures the semantics or context of an entity based on its functional role. Biomedical entities are frequently compared to each other to find similarities to help in data annotation and knowledge transfer. In this study, we propose GOntoSim, a novel method to determine the functional similarity between genes. GOntoSim quantifies the similarity between pairs of GO terms, by taking the graph structure and the information content of nodes into consideration. Our measure quantifies the similarity between the ancestors of the GO terms accurately. It also takes into account the common children of the GO terms. GOntoSim is evaluated using the entire Enzyme Dataset containing 10,890 proteins and 97,544 GO annotations. The enzymes are clustered and compared with the Gold Standard EC numbers. At level 1 of the EC Numbers for Molecular Function, GOntoSim achieves a purity score of 0.75 as compared to 0.47 and 0.51 GOGO and Wang. GOntoSim can handle the noisy IEA annotations. We achieve a purity score of 0.94 in contrast to 0.48 for both GOGO and Wang at level 1 of the EC Numbers with IEA annotations. GOntoSim can be freely accessed at (http://www.cbrlab.org/GOntoSim.html). Nature Publishing Group UK 2022-03-09 /pmc/articles/PMC8907294/ /pubmed/35264663 http://dx.doi.org/10.1038/s41598-022-07624-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Kamran, Amna Binte
Naveed, Hammad
GOntoSim: a semantic similarity measure based on LCA and common descendants
title GOntoSim: a semantic similarity measure based on LCA and common descendants
title_full GOntoSim: a semantic similarity measure based on LCA and common descendants
title_fullStr GOntoSim: a semantic similarity measure based on LCA and common descendants
title_full_unstemmed GOntoSim: a semantic similarity measure based on LCA and common descendants
title_short GOntoSim: a semantic similarity measure based on LCA and common descendants
title_sort gontosim: a semantic similarity measure based on lca and common descendants
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8907294/
https://www.ncbi.nlm.nih.gov/pubmed/35264663
http://dx.doi.org/10.1038/s41598-022-07624-3
work_keys_str_mv AT kamranamnabinte gontosimasemanticsimilaritymeasurebasedonlcaandcommondescendants
AT naveedhammad gontosimasemanticsimilaritymeasurebasedonlcaandcommondescendants