Cargando…

PSPP: A Protein Structure Prediction Pipeline for Computing Clusters

BACKGROUND: Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational pr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lee, Michael S., Bondugula, Rajkumar, Desai, Valmik, Zavaljevski, Nela, Yeh, In-Chul, Wallqvist, Anders, Reifman, Jaques
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2707601/ https://www.ncbi.nlm.nih.gov/pubmed/19606223 http://dx.doi.org/10.1371/journal.pone.0006254

_version_	1782169175546920960
author	Lee, Michael S. Bondugula, Rajkumar Desai, Valmik Zavaljevski, Nela Yeh, In-Chul Wallqvist, Anders Reifman, Jaques
author_facet	Lee, Michael S. Bondugula, Rajkumar Desai, Valmik Zavaljevski, Nela Yeh, In-Chul Wallqvist, Anders Reifman, Jaques
author_sort	Lee, Michael S.
collection	PubMed
description	BACKGROUND: Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster. METHODOLOGY/PRINCIPAL FINDINGS: The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes. CONCLUSIONS: The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.
format	Text
id	pubmed-2707601
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-27076012009-07-16 PSPP: A Protein Structure Prediction Pipeline for Computing Clusters Lee, Michael S. Bondugula, Rajkumar Desai, Valmik Zavaljevski, Nela Yeh, In-Chul Wallqvist, Anders Reifman, Jaques PLoS One Research Article BACKGROUND: Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster. METHODOLOGY/PRINCIPAL FINDINGS: The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes. CONCLUSIONS: The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction. Public Library of Science 2009-07-16 /pmc/articles/PMC2707601/ /pubmed/19606223 http://dx.doi.org/10.1371/journal.pone.0006254 Text en This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle	Research Article Lee, Michael S. Bondugula, Rajkumar Desai, Valmik Zavaljevski, Nela Yeh, In-Chul Wallqvist, Anders Reifman, Jaques PSPP: A Protein Structure Prediction Pipeline for Computing Clusters
title	PSPP: A Protein Structure Prediction Pipeline for Computing Clusters
title_full	PSPP: A Protein Structure Prediction Pipeline for Computing Clusters
title_fullStr	PSPP: A Protein Structure Prediction Pipeline for Computing Clusters
title_full_unstemmed	PSPP: A Protein Structure Prediction Pipeline for Computing Clusters
title_short	PSPP: A Protein Structure Prediction Pipeline for Computing Clusters
title_sort	pspp: a protein structure prediction pipeline for computing clusters
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2707601/ https://www.ncbi.nlm.nih.gov/pubmed/19606223 http://dx.doi.org/10.1371/journal.pone.0006254
work_keys_str_mv	AT leemichaels psppaproteinstructurepredictionpipelineforcomputingclusters AT bondugularajkumar psppaproteinstructurepredictionpipelineforcomputingclusters AT desaivalmik psppaproteinstructurepredictionpipelineforcomputingclusters AT zavaljevskinela psppaproteinstructurepredictionpipelineforcomputingclusters AT yehinchul psppaproteinstructurepredictionpipelineforcomputingclusters AT wallqvistanders psppaproteinstructurepredictionpipelineforcomputingclusters AT reifmanjaques psppaproteinstructurepredictionpipelineforcomputingclusters

PSPP: A Protein Structure Prediction Pipeline for Computing Clusters

Ejemplares similares