Cargando…

GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains

GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, wi...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, David A., Rentzsch, Robert, Orengo, Christine
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2817468/
https://www.ncbi.nlm.nih.gov/pubmed/19923231
http://dx.doi.org/10.1093/nar/gkp1049
_version_ 1782177202964529152
author Lee, David A.
Rentzsch, Robert
Orengo, Christine
author_facet Lee, David A.
Rentzsch, Robert
Orengo, Christine
author_sort Lee, David A.
collection PubMed
description GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, without the need for an initial multiple sequence alignment. Its performance is shown to be comparable to the established high-performance method SCI-PHY. GeMMA follows an agglomerative clustering protocol that uses existing software for sensitive and accurate multiple sequence alignment and profile–profile comparison. The produced subfamilies are shown to be equivalent in quality whether whole protein sequences are used or just the sequences of component predicted structural domains. A faster, heuristic version of GeMMA that also uses distributed computing is shown to maintain the performance levels of the original implementation. The use of GeMMA to increase the functional annotation coverage of functionally diverse Pfam families is demonstrated. It is further shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics.
format Text
id pubmed-2817468
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28174682010-02-08 GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains Lee, David A. Rentzsch, Robert Orengo, Christine Nucleic Acids Res Computational Biology GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, without the need for an initial multiple sequence alignment. Its performance is shown to be comparable to the established high-performance method SCI-PHY. GeMMA follows an agglomerative clustering protocol that uses existing software for sensitive and accurate multiple sequence alignment and profile–profile comparison. The produced subfamilies are shown to be equivalent in quality whether whole protein sequences are used or just the sequences of component predicted structural domains. A faster, heuristic version of GeMMA that also uses distributed computing is shown to maintain the performance levels of the original implementation. The use of GeMMA to increase the functional annotation coverage of functionally diverse Pfam families is demonstrated. It is further shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics. Oxford University Press 2010-01 2009-11-18 /pmc/articles/PMC2817468/ /pubmed/19923231 http://dx.doi.org/10.1093/nar/gkp1049 Text en © The Author(s) 2009. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Lee, David A.
Rentzsch, Robert
Orengo, Christine
GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
title GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
title_full GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
title_fullStr GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
title_full_unstemmed GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
title_short GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
title_sort gemma: functional subfamily classification within superfamilies of predicted protein structural domains
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2817468/
https://www.ncbi.nlm.nih.gov/pubmed/19923231
http://dx.doi.org/10.1093/nar/gkp1049
work_keys_str_mv AT leedavida gemmafunctionalsubfamilyclassificationwithinsuperfamiliesofpredictedproteinstructuraldomains
AT rentzschrobert gemmafunctionalsubfamilyclassificationwithinsuperfamiliesofpredictedproteinstructuraldomains
AT orengochristine gemmafunctionalsubfamilyclassificationwithinsuperfamiliesofpredictedproteinstructuraldomains