Cargando…
GOntoSim: a semantic similarity measure based on LCA and common descendants
The Gene Ontology (GO) is a controlled vocabulary that captures the semantics or context of an entity based on its functional role. Biomedical entities are frequently compared to each other to find similarities to help in data annotation and knowledge transfer. In this study, we propose GOntoSim, a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8907294/ https://www.ncbi.nlm.nih.gov/pubmed/35264663 http://dx.doi.org/10.1038/s41598-022-07624-3 |
_version_ | 1784665609717415936 |
---|---|
author | Kamran, Amna Binte Naveed, Hammad |
author_facet | Kamran, Amna Binte Naveed, Hammad |
author_sort | Kamran, Amna Binte |
collection | PubMed |
description | The Gene Ontology (GO) is a controlled vocabulary that captures the semantics or context of an entity based on its functional role. Biomedical entities are frequently compared to each other to find similarities to help in data annotation and knowledge transfer. In this study, we propose GOntoSim, a novel method to determine the functional similarity between genes. GOntoSim quantifies the similarity between pairs of GO terms, by taking the graph structure and the information content of nodes into consideration. Our measure quantifies the similarity between the ancestors of the GO terms accurately. It also takes into account the common children of the GO terms. GOntoSim is evaluated using the entire Enzyme Dataset containing 10,890 proteins and 97,544 GO annotations. The enzymes are clustered and compared with the Gold Standard EC numbers. At level 1 of the EC Numbers for Molecular Function, GOntoSim achieves a purity score of 0.75 as compared to 0.47 and 0.51 GOGO and Wang. GOntoSim can handle the noisy IEA annotations. We achieve a purity score of 0.94 in contrast to 0.48 for both GOGO and Wang at level 1 of the EC Numbers with IEA annotations. GOntoSim can be freely accessed at (http://www.cbrlab.org/GOntoSim.html). |
format | Online Article Text |
id | pubmed-8907294 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-89072942022-03-11 GOntoSim: a semantic similarity measure based on LCA and common descendants Kamran, Amna Binte Naveed, Hammad Sci Rep Article The Gene Ontology (GO) is a controlled vocabulary that captures the semantics or context of an entity based on its functional role. Biomedical entities are frequently compared to each other to find similarities to help in data annotation and knowledge transfer. In this study, we propose GOntoSim, a novel method to determine the functional similarity between genes. GOntoSim quantifies the similarity between pairs of GO terms, by taking the graph structure and the information content of nodes into consideration. Our measure quantifies the similarity between the ancestors of the GO terms accurately. It also takes into account the common children of the GO terms. GOntoSim is evaluated using the entire Enzyme Dataset containing 10,890 proteins and 97,544 GO annotations. The enzymes are clustered and compared with the Gold Standard EC numbers. At level 1 of the EC Numbers for Molecular Function, GOntoSim achieves a purity score of 0.75 as compared to 0.47 and 0.51 GOGO and Wang. GOntoSim can handle the noisy IEA annotations. We achieve a purity score of 0.94 in contrast to 0.48 for both GOGO and Wang at level 1 of the EC Numbers with IEA annotations. GOntoSim can be freely accessed at (http://www.cbrlab.org/GOntoSim.html). Nature Publishing Group UK 2022-03-09 /pmc/articles/PMC8907294/ /pubmed/35264663 http://dx.doi.org/10.1038/s41598-022-07624-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Kamran, Amna Binte Naveed, Hammad GOntoSim: a semantic similarity measure based on LCA and common descendants |
title | GOntoSim: a semantic similarity measure based on LCA and common descendants |
title_full | GOntoSim: a semantic similarity measure based on LCA and common descendants |
title_fullStr | GOntoSim: a semantic similarity measure based on LCA and common descendants |
title_full_unstemmed | GOntoSim: a semantic similarity measure based on LCA and common descendants |
title_short | GOntoSim: a semantic similarity measure based on LCA and common descendants |
title_sort | gontosim: a semantic similarity measure based on lca and common descendants |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8907294/ https://www.ncbi.nlm.nih.gov/pubmed/35264663 http://dx.doi.org/10.1038/s41598-022-07624-3 |
work_keys_str_mv | AT kamranamnabinte gontosimasemanticsimilaritymeasurebasedonlcaandcommondescendants AT naveedhammad gontosimasemanticsimilaritymeasurebasedonlcaandcommondescendants |