Cargando…
FunFam protein families improve residue level molecular function prediction
BACKGROUND: The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction o...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6639920/ https://www.ncbi.nlm.nih.gov/pubmed/31319797 http://dx.doi.org/10.1186/s12859-019-2988-x |
_version_ | 1783436556653035520 |
---|---|
author | Scheibenreif, Linus Littmann, Maria Orengo, Christine Rost, Burkhard |
author_facet | Scheibenreif, Linus Littmann, Maria Orengo, Christine Rost, Burkhard |
author_sort | Scheibenreif, Linus |
collection | PubMed |
description | BACKGROUND: The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. RESULTS: FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. CONCLUSIONS: The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2988-x) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6639920 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-66399202019-07-29 FunFam protein families improve residue level molecular function prediction Scheibenreif, Linus Littmann, Maria Orengo, Christine Rost, Burkhard BMC Bioinformatics Research Article BACKGROUND: The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. RESULTS: FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. CONCLUSIONS: The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2988-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-18 /pmc/articles/PMC6639920/ /pubmed/31319797 http://dx.doi.org/10.1186/s12859-019-2988-x Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Scheibenreif, Linus Littmann, Maria Orengo, Christine Rost, Burkhard FunFam protein families improve residue level molecular function prediction |
title | FunFam protein families improve residue level molecular function prediction |
title_full | FunFam protein families improve residue level molecular function prediction |
title_fullStr | FunFam protein families improve residue level molecular function prediction |
title_full_unstemmed | FunFam protein families improve residue level molecular function prediction |
title_short | FunFam protein families improve residue level molecular function prediction |
title_sort | funfam protein families improve residue level molecular function prediction |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6639920/ https://www.ncbi.nlm.nih.gov/pubmed/31319797 http://dx.doi.org/10.1186/s12859-019-2988-x |
work_keys_str_mv | AT scheibenreiflinus funfamproteinfamiliesimproveresiduelevelmolecularfunctionprediction AT littmannmaria funfamproteinfamiliesimproveresiduelevelmolecularfunctionprediction AT orengochristine funfamproteinfamiliesimproveresiduelevelmolecularfunctionprediction AT rostburkhard funfamproteinfamiliesimproveresiduelevelmolecularfunctionprediction |