Cargando…

CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures

We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure–based method (using graph theory) to locate known folds within a multidomain context a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Redfern, Oliver C, Harrison, Andrew, Dallman, Tim, Pearl, Frances M. G, Orengo, Christine A
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2007
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2098860/ https://www.ncbi.nlm.nih.gov/pubmed/18052539 http://dx.doi.org/10.1371/journal.pcbi.0030232

_version_	1782138282496229376
author	Redfern, Oliver C Harrison, Andrew Dallman, Tim Pearl, Frances M. G Orengo, Christine A
author_facet	Redfern, Oliver C Harrison, Andrew Dallman, Tim Pearl, Frances M. G Orengo, Christine A
author_sort	Redfern, Oliver C
collection	PubMed
description	We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure–based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification.
format	Text
id	pubmed-2098860
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-20988602007-11-29 CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures Redfern, Oliver C Harrison, Andrew Dallman, Tim Pearl, Frances M. G Orengo, Christine A PLoS Comput Biol Research Article We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure–based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification. Public Library of Science 2007-11 2007-11-30 /pmc/articles/PMC2098860/ /pubmed/18052539 http://dx.doi.org/10.1371/journal.pcbi.0030232 Text en © 2007 Redfern et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Redfern, Oliver C Harrison, Andrew Dallman, Tim Pearl, Frances M. G Orengo, Christine A CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures
title	CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures
title_full	CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures
title_fullStr	CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures
title_full_unstemmed	CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures
title_short	CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures
title_sort	cathedral: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2098860/ https://www.ncbi.nlm.nih.gov/pubmed/18052539 http://dx.doi.org/10.1371/journal.pcbi.0030232
work_keys_str_mv	AT redfernoliverc cathedralafastandeffectivealgorithmtopredictfoldsanddomainboundariesfrommultidomainproteinstructures AT harrisonandrew cathedralafastandeffectivealgorithmtopredictfoldsanddomainboundariesfrommultidomainproteinstructures AT dallmantim cathedralafastandeffectivealgorithmtopredictfoldsanddomainboundariesfrommultidomainproteinstructures AT pearlfrancesmg cathedralafastandeffectivealgorithmtopredictfoldsanddomainboundariesfrommultidomainproteinstructures AT orengochristinea cathedralafastandeffectivealgorithmtopredictfoldsanddomainboundariesfrommultidomainproteinstructures

CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures

Ejemplares similares