Cargando…

Functional classification of CATH superfamilies: a domain-based approach for protein function annotation

Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and...

Descripción completa

Detalles Bibliográficos
Autores principales: Das, Sayoni, Lee, David, Sillitoe, Ian, Dawson, Natalie L., Lees, Jonathan G., Orengo, Christine A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4612221/
https://www.ncbi.nlm.nih.gov/pubmed/26139634
http://dx.doi.org/10.1093/bioinformatics/btv398
_version_ 1782396140527812608
author Das, Sayoni
Lee, David
Sillitoe, Ian
Dawson, Natalie L.
Lees, Jonathan G.
Orengo, Christine A.
author_facet Das, Sayoni
Lee, David
Sillitoe, Ian
Dawson, Natalie L.
Lees, Jonathan G.
Orengo, Christine A.
author_sort Das, Sayoni
collection PubMed
description Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer. Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110 439 FunFams in 2735 superfamilies which can be used to functionally annotate > 16 million domain sequences. Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam. Contact: sayoni.das.12@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4612221
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46122212015-10-22 Functional classification of CATH superfamilies: a domain-based approach for protein function annotation Das, Sayoni Lee, David Sillitoe, Ian Dawson, Natalie L. Lees, Jonathan G. Orengo, Christine A. Bioinformatics Original Papers Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer. Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110 439 FunFams in 2735 superfamilies which can be used to functionally annotate > 16 million domain sequences. Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam. Contact: sayoni.das.12@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2015-11-01 2015-07-02 /pmc/articles/PMC4612221/ /pubmed/26139634 http://dx.doi.org/10.1093/bioinformatics/btv398 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Das, Sayoni
Lee, David
Sillitoe, Ian
Dawson, Natalie L.
Lees, Jonathan G.
Orengo, Christine A.
Functional classification of CATH superfamilies: a domain-based approach for protein function annotation
title Functional classification of CATH superfamilies: a domain-based approach for protein function annotation
title_full Functional classification of CATH superfamilies: a domain-based approach for protein function annotation
title_fullStr Functional classification of CATH superfamilies: a domain-based approach for protein function annotation
title_full_unstemmed Functional classification of CATH superfamilies: a domain-based approach for protein function annotation
title_short Functional classification of CATH superfamilies: a domain-based approach for protein function annotation
title_sort functional classification of cath superfamilies: a domain-based approach for protein function annotation
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4612221/
https://www.ncbi.nlm.nih.gov/pubmed/26139634
http://dx.doi.org/10.1093/bioinformatics/btv398
work_keys_str_mv AT dassayoni functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation
AT leedavid functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation
AT sillitoeian functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation
AT dawsonnataliel functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation
AT leesjonathang functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation
AT orengochristinea functionalclassificationofcathsuperfamiliesadomainbasedapproachforproteinfunctionannotation