Cargando…

Computational discovery of direct associations between GO terms and protein domains

BACKGROUND: Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, ca...

Descripción completa

Detalles Bibliográficos
Autores principales: Alborzi, Seyed Ziaeddin, Ritchie, David W., Devignes, Marie-Dominique
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245584/
https://www.ncbi.nlm.nih.gov/pubmed/30453875
http://dx.doi.org/10.1186/s12859-018-2380-2
_version_ 1783372268702793728
author Alborzi, Seyed Ziaeddin
Ritchie, David W.
Devignes, Marie-Dominique
author_facet Alborzi, Seyed Ziaeddin
Ritchie, David W.
Devignes, Marie-Dominique
author_sort Alborzi, Seyed Ziaeddin
collection PubMed
description BACKGROUND: Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task. RESULTS: We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach “CODAC” (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe “GODomainMiner” for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively. CONCLUSIONS: These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2380-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6245584
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62455842018-11-26 Computational discovery of direct associations between GO terms and protein domains Alborzi, Seyed Ziaeddin Ritchie, David W. Devignes, Marie-Dominique BMC Bioinformatics Research BACKGROUND: Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task. RESULTS: We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach “CODAC” (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe “GODomainMiner” for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively. CONCLUSIONS: These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2380-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-20 /pmc/articles/PMC6245584/ /pubmed/30453875 http://dx.doi.org/10.1186/s12859-018-2380-2 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Alborzi, Seyed Ziaeddin
Ritchie, David W.
Devignes, Marie-Dominique
Computational discovery of direct associations between GO terms and protein domains
title Computational discovery of direct associations between GO terms and protein domains
title_full Computational discovery of direct associations between GO terms and protein domains
title_fullStr Computational discovery of direct associations between GO terms and protein domains
title_full_unstemmed Computational discovery of direct associations between GO terms and protein domains
title_short Computational discovery of direct associations between GO terms and protein domains
title_sort computational discovery of direct associations between go terms and protein domains
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245584/
https://www.ncbi.nlm.nih.gov/pubmed/30453875
http://dx.doi.org/10.1186/s12859-018-2380-2
work_keys_str_mv AT alborziseyedziaeddin computationaldiscoveryofdirectassociationsbetweengotermsandproteindomains
AT ritchiedavidw computationaldiscoveryofdirectassociationsbetweengotermsandproteindomains
AT devignesmariedominique computationaldiscoveryofdirectassociationsbetweengotermsandproteindomains