Cargando…
Functional classification of CATH superfamilies: a domain-based approach for protein function annotation
Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4612221/ https://www.ncbi.nlm.nih.gov/pubmed/26139634 http://dx.doi.org/10.1093/bioinformatics/btv398 |
_version_ | 1782396140527812608 |
---|---|
author | Das, Sayoni Lee, David Sillitoe, Ian Dawson, Natalie L. Lees, Jonathan G. Orengo, Christine A. |
author_facet | Das, Sayoni Lee, David Sillitoe, Ian Dawson, Natalie L. Lees, Jonathan G. Orengo, Christine A. |
author_sort | Das, Sayoni |
collection | PubMed |
description | Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer. Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110 439 FunFams in 2735 superfamilies which can be used to functionally annotate > 16 million domain sequences. Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam. Contact: sayoni.das.12@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4612221 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-46122212015-10-22 Functional classification of CATH superfamilies: a domain-based approach for protein function annotation Das, Sayoni Lee, David Sillitoe, Ian Dawson, Natalie L. Lees, Jonathan G. Orengo, Christine A. Bioinformatics Original Papers Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer. Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110 439 FunFams in 2735 superfamilies which can be used to functionally annotate > 16 million domain sequences. Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam. Contact: sayoni.das.12@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2015-11-01 2015-07-02 /pmc/articles/PMC4612221/ /pubmed/26139634 http://dx.doi.org/10.1093/bioinformatics/btv398 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Das, Sayoni Lee, David Sillitoe, Ian Dawson, Natalie L. Lees, Jonathan G. Orengo, Christine A. Functional classification of CATH superfamilies: a domain-based approach for protein function annotation |
title | Functional classification of CATH superfamilies: a domain-based approach for protein function annotation |
title_full | Functional classification of CATH superfamilies: a domain-based approach for protein function annotation |
title_fullStr | Functional classification of CATH superfamilies: a domain-based approach for protein function annotation |
title_full_unstemmed | Functional classification of CATH superfamilies: a domain-based approach for protein function annotation |
title_short | Functional classification of CATH superfamilies: a domain-based approach for protein function annotation |
title_sort | functional classification of cath superfamilies: a domain-based approach for protein function annotation |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4612221/ https://www.ncbi.nlm.nih.gov/pubmed/26139634 http://dx.doi.org/10.1093/bioinformatics/btv398 |
work_keys_str_mv | AT dassayoni functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation AT leedavid functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation AT sillitoeian functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation AT dawsonnataliel functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation AT leesjonathang functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation AT orengochristinea functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation |