Cargando…

PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

BACKGROUND: Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Green, James R, Korenberg, Michael J, Aboul-Magd, Mohammed O
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2720391/ https://www.ncbi.nlm.nih.gov/pubmed/19615046 http://dx.doi.org/10.1186/1471-2105-10-222

_version_	1782170131837747200
author	Green, James R Korenberg, Michael J Aboul-Magd, Mohammed O
author_facet	Green, James R Korenberg, Michael J Aboul-Magd, Mohammed O
author_sort	Green, James R
collection	PubMed
description	BACKGROUND: Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification. RESULTS: Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at . In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP) interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input protein sequence data and also to encode the resulting structure prediction in a machine-readable format. To our knowledge, this represents the only publicly available SOAP-interface for a protein secondary structure prediction service with published WSDL interface definition. CONCLUSION: Relative to the 9 contemporary methods included in the comparison cascaded PCI classifiers perform well, however PCI finds greatest application as a consensus classifier. When PCI is used to combine a sequence-to-structure PCI-based classifier with the current leading ANN-based method, PSIPRED, the overall error rate (Q3) is maintained while the rate of occurrence of a particularly detrimental error is reduced by up to 25%. This improvement in BAD score, combined with the machine-readable SOAP web service interface makes PCI-SS particularly useful for inclusion in a tertiary structure prediction pipeline.
format	Text
id	pubmed-2720391
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27203912009-08-04 PCI-SS: MISO dynamic nonlinear protein secondary structure prediction Green, James R Korenberg, Michael J Aboul-Magd, Mohammed O BMC Bioinformatics Research Article BACKGROUND: Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification. RESULTS: Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at . In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP) interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input protein sequence data and also to encode the resulting structure prediction in a machine-readable format. To our knowledge, this represents the only publicly available SOAP-interface for a protein secondary structure prediction service with published WSDL interface definition. CONCLUSION: Relative to the 9 contemporary methods included in the comparison cascaded PCI classifiers perform well, however PCI finds greatest application as a consensus classifier. When PCI is used to combine a sequence-to-structure PCI-based classifier with the current leading ANN-based method, PSIPRED, the overall error rate (Q3) is maintained while the rate of occurrence of a particularly detrimental error is reduced by up to 25%. This improvement in BAD score, combined with the machine-readable SOAP web service interface makes PCI-SS particularly useful for inclusion in a tertiary structure prediction pipeline. BioMed Central 2009-07-17 /pmc/articles/PMC2720391/ /pubmed/19615046 http://dx.doi.org/10.1186/1471-2105-10-222 Text en Copyright © 2009 Green et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Green, James R Korenberg, Michael J Aboul-Magd, Mohammed O PCI-SS: MISO dynamic nonlinear protein secondary structure prediction
title	PCI-SS: MISO dynamic nonlinear protein secondary structure prediction
title_full	PCI-SS: MISO dynamic nonlinear protein secondary structure prediction
title_fullStr	PCI-SS: MISO dynamic nonlinear protein secondary structure prediction
title_full_unstemmed	PCI-SS: MISO dynamic nonlinear protein secondary structure prediction
title_short	PCI-SS: MISO dynamic nonlinear protein secondary structure prediction
title_sort	pci-ss: miso dynamic nonlinear protein secondary structure prediction
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2720391/ https://www.ncbi.nlm.nih.gov/pubmed/19615046 http://dx.doi.org/10.1186/1471-2105-10-222
work_keys_str_mv	AT greenjamesr pcissmisodynamicnonlinearproteinsecondarystructureprediction AT korenbergmichaelj pcissmisodynamicnonlinearproteinsecondarystructureprediction AT aboulmagdmohammedo pcissmisodynamicnonlinearproteinsecondarystructureprediction

PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

Ejemplares similares