Cargando…

FunFam protein families improve residue level molecular function prediction

BACKGROUND: The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction o...

Descripción completa

Detalles Bibliográficos
Autores principales: Scheibenreif, Linus, Littmann, Maria, Orengo, Christine, Rost, Burkhard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6639920/
https://www.ncbi.nlm.nih.gov/pubmed/31319797
http://dx.doi.org/10.1186/s12859-019-2988-x
_version_ 1783436556653035520
author Scheibenreif, Linus
Littmann, Maria
Orengo, Christine
Rost, Burkhard
author_facet Scheibenreif, Linus
Littmann, Maria
Orengo, Christine
Rost, Burkhard
author_sort Scheibenreif, Linus
collection PubMed
description BACKGROUND: The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. RESULTS: FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. CONCLUSIONS: The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2988-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6639920
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66399202019-07-29 FunFam protein families improve residue level molecular function prediction Scheibenreif, Linus Littmann, Maria Orengo, Christine Rost, Burkhard BMC Bioinformatics Research Article BACKGROUND: The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. RESULTS: FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. CONCLUSIONS: The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2988-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-18 /pmc/articles/PMC6639920/ /pubmed/31319797 http://dx.doi.org/10.1186/s12859-019-2988-x Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Scheibenreif, Linus
Littmann, Maria
Orengo, Christine
Rost, Burkhard
FunFam protein families improve residue level molecular function prediction
title FunFam protein families improve residue level molecular function prediction
title_full FunFam protein families improve residue level molecular function prediction
title_fullStr FunFam protein families improve residue level molecular function prediction
title_full_unstemmed FunFam protein families improve residue level molecular function prediction
title_short FunFam protein families improve residue level molecular function prediction
title_sort funfam protein families improve residue level molecular function prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6639920/
https://www.ncbi.nlm.nih.gov/pubmed/31319797
http://dx.doi.org/10.1186/s12859-019-2988-x
work_keys_str_mv AT scheibenreiflinus funfamproteinfamiliesimproveresiduelevelmolecularfunctionprediction
AT littmannmaria funfamproteinfamiliesimproveresiduelevelmolecularfunctionprediction
AT orengochristine funfamproteinfamiliesimproveresiduelevelmolecularfunctionprediction
AT rostburkhard funfamproteinfamiliesimproveresiduelevelmolecularfunctionprediction