Cargando…

Improving automatic GO annotation with semantic similarity

BACKGROUND: Automatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and...

Descripción completa

Detalles Bibliográficos
Autores principales: Sarker, Bishnu, Khare, Navya, Devignes, Marie-Dominique, Aridhi, Sabeur
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9743508/
https://www.ncbi.nlm.nih.gov/pubmed/36510133
http://dx.doi.org/10.1186/s12859-022-04958-7
_version_ 1784848737662664704
author Sarker, Bishnu
Khare, Navya
Devignes, Marie-Dominique
Aridhi, Sabeur
author_facet Sarker, Bishnu
Khare, Navya
Devignes, Marie-Dominique
Aridhi, Sabeur
author_sort Sarker, Bishnu
collection PubMed
description BACKGROUND: Automatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problem RESULTS: In this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in: Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent–child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure. CONCLUSION: Our results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions
format Online
Article
Text
id pubmed-9743508
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-97435082022-12-13 Improving automatic GO annotation with semantic similarity Sarker, Bishnu Khare, Navya Devignes, Marie-Dominique Aridhi, Sabeur BMC Bioinformatics Methodology BACKGROUND: Automatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problem RESULTS: In this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in: Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent–child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure. CONCLUSION: Our results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions BioMed Central 2022-12-12 /pmc/articles/PMC9743508/ /pubmed/36510133 http://dx.doi.org/10.1186/s12859-022-04958-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Sarker, Bishnu
Khare, Navya
Devignes, Marie-Dominique
Aridhi, Sabeur
Improving automatic GO annotation with semantic similarity
title Improving automatic GO annotation with semantic similarity
title_full Improving automatic GO annotation with semantic similarity
title_fullStr Improving automatic GO annotation with semantic similarity
title_full_unstemmed Improving automatic GO annotation with semantic similarity
title_short Improving automatic GO annotation with semantic similarity
title_sort improving automatic go annotation with semantic similarity
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9743508/
https://www.ncbi.nlm.nih.gov/pubmed/36510133
http://dx.doi.org/10.1186/s12859-022-04958-7
work_keys_str_mv AT sarkerbishnu improvingautomaticgoannotationwithsemanticsimilarity
AT kharenavya improvingautomaticgoannotationwithsemanticsimilarity
AT devignesmariedominique improvingautomaticgoannotationwithsemanticsimilarity
AT aridhisabeur improvingautomaticgoannotationwithsemanticsimilarity