Cargando…

Genomic scale sub-family assignment of protein domains

Many classification schemes for proteins and domains are either hierarchical or semi-hierarchical yet most databases, especially those offering genome-wide analysis, only provide assignments to sequences at one level of their hierarchy. Given an established hierarchy, the problem of assigning new se...

Descripción completa

Detalles Bibliográficos
Autor principal: Gough, Julian
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1540727/
https://www.ncbi.nlm.nih.gov/pubmed/16877569
http://dx.doi.org/10.1093/nar/gkl484
_version_ 1782129178508787712
author Gough, Julian
author_facet Gough, Julian
author_sort Gough, Julian
collection PubMed
description Many classification schemes for proteins and domains are either hierarchical or semi-hierarchical yet most databases, especially those offering genome-wide analysis, only provide assignments to sequences at one level of their hierarchy. Given an established hierarchy, the problem of assigning new sequences to lower levels of that existing hierarchy is less hard (but no less important) than the initial top level assignment which requires the detection of the most distant relationships. A solution to this problem is described here in the form of a new procedure which can be thought of as a hybrid between pairwise and profile methods. The hybrid method is a general procedure that can be applied to any pre-defined hierarchy, at any level, including in principle multiple sub-levels. It has been tested on the SCOP classification via the SUPERFAMILY database and performs significantly better than either pairwise or profile methods alone. Perhaps the greatest advantage of the hybrid method over other possible approaches to the problem is that within the framework of an existing profile library, the assignments are fully automatic and come at almost no additional computational cost. Hence it has already been applied at the SCOP family level to all genomes in the SUPERFAMILY database, providing a wealth of new data to the biological and bioinformatics communities.
format Text
id pubmed-1540727
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-15407272006-08-24 Genomic scale sub-family assignment of protein domains Gough, Julian Nucleic Acids Res Article Many classification schemes for proteins and domains are either hierarchical or semi-hierarchical yet most databases, especially those offering genome-wide analysis, only provide assignments to sequences at one level of their hierarchy. Given an established hierarchy, the problem of assigning new sequences to lower levels of that existing hierarchy is less hard (but no less important) than the initial top level assignment which requires the detection of the most distant relationships. A solution to this problem is described here in the form of a new procedure which can be thought of as a hybrid between pairwise and profile methods. The hybrid method is a general procedure that can be applied to any pre-defined hierarchy, at any level, including in principle multiple sub-levels. It has been tested on the SCOP classification via the SUPERFAMILY database and performs significantly better than either pairwise or profile methods alone. Perhaps the greatest advantage of the hybrid method over other possible approaches to the problem is that within the framework of an existing profile library, the assignments are fully automatic and come at almost no additional computational cost. Hence it has already been applied at the SCOP family level to all genomes in the SUPERFAMILY database, providing a wealth of new data to the biological and bioinformatics communities. Oxford University Press 2006 2006-07-28 /pmc/articles/PMC1540727/ /pubmed/16877569 http://dx.doi.org/10.1093/nar/gkl484 Text en © 2006 The Author(s)
spellingShingle Article
Gough, Julian
Genomic scale sub-family assignment of protein domains
title Genomic scale sub-family assignment of protein domains
title_full Genomic scale sub-family assignment of protein domains
title_fullStr Genomic scale sub-family assignment of protein domains
title_full_unstemmed Genomic scale sub-family assignment of protein domains
title_short Genomic scale sub-family assignment of protein domains
title_sort genomic scale sub-family assignment of protein domains
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1540727/
https://www.ncbi.nlm.nih.gov/pubmed/16877569
http://dx.doi.org/10.1093/nar/gkl484
work_keys_str_mv AT goughjulian genomicscalesubfamilyassignmentofproteindomains