Cargando…
Computational discovery of direct associations between GO terms and protein domains
BACKGROUND: Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, ca...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245584/ https://www.ncbi.nlm.nih.gov/pubmed/30453875 http://dx.doi.org/10.1186/s12859-018-2380-2 |
_version_ | 1783372268702793728 |
---|---|
author | Alborzi, Seyed Ziaeddin Ritchie, David W. Devignes, Marie-Dominique |
author_facet | Alborzi, Seyed Ziaeddin Ritchie, David W. Devignes, Marie-Dominique |
author_sort | Alborzi, Seyed Ziaeddin |
collection | PubMed |
description | BACKGROUND: Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task. RESULTS: We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach “CODAC” (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe “GODomainMiner” for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively. CONCLUSIONS: These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2380-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6245584 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-62455842018-11-26 Computational discovery of direct associations between GO terms and protein domains Alborzi, Seyed Ziaeddin Ritchie, David W. Devignes, Marie-Dominique BMC Bioinformatics Research BACKGROUND: Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task. RESULTS: We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach “CODAC” (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe “GODomainMiner” for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively. CONCLUSIONS: These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2380-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-20 /pmc/articles/PMC6245584/ /pubmed/30453875 http://dx.doi.org/10.1186/s12859-018-2380-2 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Alborzi, Seyed Ziaeddin Ritchie, David W. Devignes, Marie-Dominique Computational discovery of direct associations between GO terms and protein domains |
title | Computational discovery of direct associations between GO terms and protein domains |
title_full | Computational discovery of direct associations between GO terms and protein domains |
title_fullStr | Computational discovery of direct associations between GO terms and protein domains |
title_full_unstemmed | Computational discovery of direct associations between GO terms and protein domains |
title_short | Computational discovery of direct associations between GO terms and protein domains |
title_sort | computational discovery of direct associations between go terms and protein domains |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245584/ https://www.ncbi.nlm.nih.gov/pubmed/30453875 http://dx.doi.org/10.1186/s12859-018-2380-2 |
work_keys_str_mv | AT alborziseyedziaeddin computationaldiscoveryofdirectassociationsbetweengotermsandproteindomains AT ritchiedavidw computationaldiscoveryofdirectassociationsbetweengotermsandproteindomains AT devignesmariedominique computationaldiscoveryofdirectassociationsbetweengotermsandproteindomains |