Cargando…
GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, wi...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2817468/ https://www.ncbi.nlm.nih.gov/pubmed/19923231 http://dx.doi.org/10.1093/nar/gkp1049 |
_version_ | 1782177202964529152 |
---|---|
author | Lee, David A. Rentzsch, Robert Orengo, Christine |
author_facet | Lee, David A. Rentzsch, Robert Orengo, Christine |
author_sort | Lee, David A. |
collection | PubMed |
description | GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, without the need for an initial multiple sequence alignment. Its performance is shown to be comparable to the established high-performance method SCI-PHY. GeMMA follows an agglomerative clustering protocol that uses existing software for sensitive and accurate multiple sequence alignment and profile–profile comparison. The produced subfamilies are shown to be equivalent in quality whether whole protein sequences are used or just the sequences of component predicted structural domains. A faster, heuristic version of GeMMA that also uses distributed computing is shown to maintain the performance levels of the original implementation. The use of GeMMA to increase the functional annotation coverage of functionally diverse Pfam families is demonstrated. It is further shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics. |
format | Text |
id | pubmed-2817468 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-28174682010-02-08 GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains Lee, David A. Rentzsch, Robert Orengo, Christine Nucleic Acids Res Computational Biology GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, without the need for an initial multiple sequence alignment. Its performance is shown to be comparable to the established high-performance method SCI-PHY. GeMMA follows an agglomerative clustering protocol that uses existing software for sensitive and accurate multiple sequence alignment and profile–profile comparison. The produced subfamilies are shown to be equivalent in quality whether whole protein sequences are used or just the sequences of component predicted structural domains. A faster, heuristic version of GeMMA that also uses distributed computing is shown to maintain the performance levels of the original implementation. The use of GeMMA to increase the functional annotation coverage of functionally diverse Pfam families is demonstrated. It is further shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics. Oxford University Press 2010-01 2009-11-18 /pmc/articles/PMC2817468/ /pubmed/19923231 http://dx.doi.org/10.1093/nar/gkp1049 Text en © The Author(s) 2009. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Lee, David A. Rentzsch, Robert Orengo, Christine GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains |
title | GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains |
title_full | GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains |
title_fullStr | GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains |
title_full_unstemmed | GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains |
title_short | GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains |
title_sort | gemma: functional subfamily classification within superfamilies of predicted protein structural domains |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2817468/ https://www.ncbi.nlm.nih.gov/pubmed/19923231 http://dx.doi.org/10.1093/nar/gkp1049 |
work_keys_str_mv | AT leedavida gemmafunctionalsubfamilyclassificationwithinsuperfamiliesofpredictedproteinstructuraldomains AT rentzschrobert gemmafunctionalsubfamilyclassificationwithinsuperfamiliesofpredictedproteinstructuraldomains AT orengochristine gemmafunctionalsubfamilyclassificationwithinsuperfamiliesofpredictedproteinstructuraldomains |