Cargando…

Assigning protein function from domain-function associations using DomFun

BACKGROUND: Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function...

Descripción completa

Detalles Bibliográficos
Autores principales: Rojano, Elena, Jabato, Fernando M., Perkins, James R., Córdoba-Caballero, José, García-Criado, Federico, Sillitoe, Ian, Orengo, Christine, Ranea, Juan A. G., Seoane-Zonjic, Pedro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8761305/
https://www.ncbi.nlm.nih.gov/pubmed/35033002
http://dx.doi.org/10.1186/s12859-022-04565-6
_version_ 1784633499530035200
author Rojano, Elena
Jabato, Fernando M.
Perkins, James R.
Córdoba-Caballero, José
García-Criado, Federico
Sillitoe, Ian
Orengo, Christine
Ranea, Juan A. G.
Seoane-Zonjic, Pedro
author_facet Rojano, Elena
Jabato, Fernando M.
Perkins, James R.
Córdoba-Caballero, José
García-Criado, Federico
Sillitoe, Ian
Orengo, Christine
Ranea, Juan A. G.
Seoane-Zonjic, Pedro
author_sort Rojano, Elena
collection PubMed
description BACKGROUND: Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions. RESULTS: We analysed 16 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the three Gene Ontology (GO) sub-ontologies, KEGG, and Reactome. We validated the results using the CAFA 3 benchmark platform for GO annotation, finding that out of the multiple association metrics and domain datasets tested, Simpson index for FunFam domain-function associations combined with Stouffer’s method leads to the best performance in almost all scenarios. We also found that using FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. DomFun performed as well as the highest-performing method in certain CAFA 3 evaluation procedures in terms of [Formula: see text] and [Formula: see text] We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 3 for GO, moreover we found good performance for the other annotation sources. As with CAFA 3, Simpson index with Stouffer’s method led to the top performance in almost all scenarios. CONCLUSIONS: DomFun shows competitive performance with other methods evaluated in CAFA 3 when predicting proteins function with GO, although results vary depending on the evaluation procedure. Through our own benchmark procedure, PPP, we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson index derived domain-function associations using Stouffer’s method. The tool has been implemented so that it can be easily adapted to incorporate other protein features, such as domain data from other sources, amino acid k-mers and motifs. The DomFun Ruby gem is available from https://rubygems.org/gems/DomFun. Code maintained at https://github.com/ElenaRojano/DomFun. Validation procedure scripts can be found at https://github.com/ElenaRojano/DomFun_project. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04565-6.
format Online
Article
Text
id pubmed-8761305
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-87613052022-01-18 Assigning protein function from domain-function associations using DomFun Rojano, Elena Jabato, Fernando M. Perkins, James R. Córdoba-Caballero, José García-Criado, Federico Sillitoe, Ian Orengo, Christine Ranea, Juan A. G. Seoane-Zonjic, Pedro BMC Bioinformatics Software BACKGROUND: Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions. RESULTS: We analysed 16 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the three Gene Ontology (GO) sub-ontologies, KEGG, and Reactome. We validated the results using the CAFA 3 benchmark platform for GO annotation, finding that out of the multiple association metrics and domain datasets tested, Simpson index for FunFam domain-function associations combined with Stouffer’s method leads to the best performance in almost all scenarios. We also found that using FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. DomFun performed as well as the highest-performing method in certain CAFA 3 evaluation procedures in terms of [Formula: see text] and [Formula: see text] We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 3 for GO, moreover we found good performance for the other annotation sources. As with CAFA 3, Simpson index with Stouffer’s method led to the top performance in almost all scenarios. CONCLUSIONS: DomFun shows competitive performance with other methods evaluated in CAFA 3 when predicting proteins function with GO, although results vary depending on the evaluation procedure. Through our own benchmark procedure, PPP, we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson index derived domain-function associations using Stouffer’s method. The tool has been implemented so that it can be easily adapted to incorporate other protein features, such as domain data from other sources, amino acid k-mers and motifs. The DomFun Ruby gem is available from https://rubygems.org/gems/DomFun. Code maintained at https://github.com/ElenaRojano/DomFun. Validation procedure scripts can be found at https://github.com/ElenaRojano/DomFun_project. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04565-6. BioMed Central 2022-01-15 /pmc/articles/PMC8761305/ /pubmed/35033002 http://dx.doi.org/10.1186/s12859-022-04565-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Rojano, Elena
Jabato, Fernando M.
Perkins, James R.
Córdoba-Caballero, José
García-Criado, Federico
Sillitoe, Ian
Orengo, Christine
Ranea, Juan A. G.
Seoane-Zonjic, Pedro
Assigning protein function from domain-function associations using DomFun
title Assigning protein function from domain-function associations using DomFun
title_full Assigning protein function from domain-function associations using DomFun
title_fullStr Assigning protein function from domain-function associations using DomFun
title_full_unstemmed Assigning protein function from domain-function associations using DomFun
title_short Assigning protein function from domain-function associations using DomFun
title_sort assigning protein function from domain-function associations using domfun
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8761305/
https://www.ncbi.nlm.nih.gov/pubmed/35033002
http://dx.doi.org/10.1186/s12859-022-04565-6
work_keys_str_mv AT rojanoelena assigningproteinfunctionfromdomainfunctionassociationsusingdomfun
AT jabatofernandom assigningproteinfunctionfromdomainfunctionassociationsusingdomfun
AT perkinsjamesr assigningproteinfunctionfromdomainfunctionassociationsusingdomfun
AT cordobacaballerojose assigningproteinfunctionfromdomainfunctionassociationsusingdomfun
AT garciacriadofederico assigningproteinfunctionfromdomainfunctionassociationsusingdomfun
AT sillitoeian assigningproteinfunctionfromdomainfunctionassociationsusingdomfun
AT orengochristine assigningproteinfunctionfromdomainfunctionassociationsusingdomfun
AT raneajuanag assigningproteinfunctionfromdomainfunctionassociationsusingdomfun
AT seoanezonjicpedro assigningproteinfunctionfromdomainfunctionassociationsusingdomfun