Cargando…

Interspecies gene function prediction using semantic similarity

BACKGROUND: Gene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Guoxian, Luo, Wei, Fu, Guangyuan, Wang, Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260010/
https://www.ncbi.nlm.nih.gov/pubmed/28155711
http://dx.doi.org/10.1186/s12918-016-0361-5
_version_ 1782499323317059584
author Yu, Guoxian
Luo, Wei
Fu, Guangyuan
Wang, Jun
author_facet Yu, Guoxian
Luo, Wei
Fu, Guangyuan
Wang, Jun
author_sort Yu, Guoxian
collection PubMed
description BACKGROUND: Gene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO terms. GO consortium independently and collaboratively annotate terms to gene products, mainly from model organisms (or species) they are interested in. Due to experiment ethics, research interests of biologists and resources limitations, homologous genes from different species currently are annotated with different terms. These differences can be more attributed to incomplete annotations of genes than to functional difference between them. RESULTS: Semantic similarity between genes is derived from GO hierarchy and annotations of genes. It is positively correlated with the similarity derived from various types of biological data and has been applied to predict gene function. In this paper, we investigate whether it is possible to replenish annotations of incompletely annotated genes by using semantic similarity between genes from two species with homology. For this investigation, we utilize three representative semantic similarity metrics to compute similarity between genes from two species. Next, we determine the k nearest neighborhood genes from the two species based on the chosen metric and then use terms annotated to k neighbors of a gene to replenish annotations of that gene. We perform experiments on archived (from Jan-2014 to Jan-2016) GO annotations of four species (Human, Mouse, Danio rerio and Arabidopsis thaliana) to assess the contribution of semantic similarity between genes from different species. The experimental results demonstrate that: (1) semantic similarity between genes from homologous species contributes much more on the improved accuracy (by 53.22%) than genes from single species alone, and genes from two species with low homology; (2) GO annotations of genes from homologous species are complementary to each other. CONCLUSIONS: Our study shows that semantic similarity based interspecies gene function annotation from homologous species is more prominent than traditional intraspecies approaches. This work can promote more research on semantic similarity based function prediction across species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-016-0361-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5260010
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52600102017-01-26 Interspecies gene function prediction using semantic similarity Yu, Guoxian Luo, Wei Fu, Guangyuan Wang, Jun BMC Syst Biol Research BACKGROUND: Gene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO terms. GO consortium independently and collaboratively annotate terms to gene products, mainly from model organisms (or species) they are interested in. Due to experiment ethics, research interests of biologists and resources limitations, homologous genes from different species currently are annotated with different terms. These differences can be more attributed to incomplete annotations of genes than to functional difference between them. RESULTS: Semantic similarity between genes is derived from GO hierarchy and annotations of genes. It is positively correlated with the similarity derived from various types of biological data and has been applied to predict gene function. In this paper, we investigate whether it is possible to replenish annotations of incompletely annotated genes by using semantic similarity between genes from two species with homology. For this investigation, we utilize three representative semantic similarity metrics to compute similarity between genes from two species. Next, we determine the k nearest neighborhood genes from the two species based on the chosen metric and then use terms annotated to k neighbors of a gene to replenish annotations of that gene. We perform experiments on archived (from Jan-2014 to Jan-2016) GO annotations of four species (Human, Mouse, Danio rerio and Arabidopsis thaliana) to assess the contribution of semantic similarity between genes from different species. The experimental results demonstrate that: (1) semantic similarity between genes from homologous species contributes much more on the improved accuracy (by 53.22%) than genes from single species alone, and genes from two species with low homology; (2) GO annotations of genes from homologous species are complementary to each other. CONCLUSIONS: Our study shows that semantic similarity based interspecies gene function annotation from homologous species is more prominent than traditional intraspecies approaches. This work can promote more research on semantic similarity based function prediction across species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-016-0361-5) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-23 /pmc/articles/PMC5260010/ /pubmed/28155711 http://dx.doi.org/10.1186/s12918-016-0361-5 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yu, Guoxian
Luo, Wei
Fu, Guangyuan
Wang, Jun
Interspecies gene function prediction using semantic similarity
title Interspecies gene function prediction using semantic similarity
title_full Interspecies gene function prediction using semantic similarity
title_fullStr Interspecies gene function prediction using semantic similarity
title_full_unstemmed Interspecies gene function prediction using semantic similarity
title_short Interspecies gene function prediction using semantic similarity
title_sort interspecies gene function prediction using semantic similarity
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260010/
https://www.ncbi.nlm.nih.gov/pubmed/28155711
http://dx.doi.org/10.1186/s12918-016-0361-5
work_keys_str_mv AT yuguoxian interspeciesgenefunctionpredictionusingsemanticsimilarity
AT luowei interspeciesgenefunctionpredictionusingsemanticsimilarity
AT fuguangyuan interspeciesgenefunctionpredictionusingsemanticsimilarity
AT wangjun interspeciesgenefunctionpredictionusingsemanticsimilarity