Cargando…

CRYSTALP2: sequence-based protein crystallization propensity prediction

BACKGROUND: Current protocols yield crystals for <30% of known proteins, indicating that automatically identifying crystallizable proteins may improve high-throughput structural genomics efforts. We introduce CRYSTALP2, a kernel-based method that predicts the propensity of a given protein sequenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Kurgan, Lukasz, Razib, Ali A, Aghakhani, Sara, Dick, Scott, Mizianty, Marcin, Jahandideh, Samad
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2731098/
https://www.ncbi.nlm.nih.gov/pubmed/19646256
http://dx.doi.org/10.1186/1472-6807-9-50
_version_ 1782170939347173376
author Kurgan, Lukasz
Razib, Ali A
Aghakhani, Sara
Dick, Scott
Mizianty, Marcin
Jahandideh, Samad
author_facet Kurgan, Lukasz
Razib, Ali A
Aghakhani, Sara
Dick, Scott
Mizianty, Marcin
Jahandideh, Samad
author_sort Kurgan, Lukasz
collection PubMed
description BACKGROUND: Current protocols yield crystals for <30% of known proteins, indicating that automatically identifying crystallizable proteins may improve high-throughput structural genomics efforts. We introduce CRYSTALP2, a kernel-based method that predicts the propensity of a given protein sequence to produce diffraction-quality crystals. This method utilizes the composition and collocation of amino acids, isoelectric point, and hydrophobicity, as estimated from the primary sequence, to generate predictions. CRYSTALP2 extends its predecessor, CRYSTALP, by enabling predictions for sequences of unrestricted size and provides improved prediction quality. RESULTS: A significant majority of the collocations used by CRYSTALP2 include residues with high conformational entropy, or low entropy and high potential to mediate crystal contacts; notably, such residues are utilized by surface entropy reduction methods. We show that the collocations provide complementary information to the hydrophobicity and isoelectric point. Tests on four datasets show that CRYSTALP2 outperforms several existing sequence-based predictors (CRYSTALP, OB-score, and SECRET). CRYSTALP2's accuracy, MCC, and AROC range between 69.3 and 77.5%, 0.39 and 0.55, and 0.72 and 0.79, respectively. Our predictions are similar in quality and are complementary to the predictions of the most recent ParCrys and XtalPred methods. Our results also suggest that, as work in protein crystallization continues (thereby enlarging the population of proteins with known crystallization propensities), the prediction quality of the CRYSTALP2 method should increase. The prediction model and the datasets used in this contribution can be downloaded from . CONCLUSION: CRYSTALP2 provides relatively accurate crystallization propensity predictions for a given protein chain that either outperform or complement the existing approaches. The proposed method can be used to support current efforts towards improving the success rate in obtaining diffraction-quality crystals.
format Text
id pubmed-2731098
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27310982009-08-24 CRYSTALP2: sequence-based protein crystallization propensity prediction Kurgan, Lukasz Razib, Ali A Aghakhani, Sara Dick, Scott Mizianty, Marcin Jahandideh, Samad BMC Struct Biol Methodology Article BACKGROUND: Current protocols yield crystals for <30% of known proteins, indicating that automatically identifying crystallizable proteins may improve high-throughput structural genomics efforts. We introduce CRYSTALP2, a kernel-based method that predicts the propensity of a given protein sequence to produce diffraction-quality crystals. This method utilizes the composition and collocation of amino acids, isoelectric point, and hydrophobicity, as estimated from the primary sequence, to generate predictions. CRYSTALP2 extends its predecessor, CRYSTALP, by enabling predictions for sequences of unrestricted size and provides improved prediction quality. RESULTS: A significant majority of the collocations used by CRYSTALP2 include residues with high conformational entropy, or low entropy and high potential to mediate crystal contacts; notably, such residues are utilized by surface entropy reduction methods. We show that the collocations provide complementary information to the hydrophobicity and isoelectric point. Tests on four datasets show that CRYSTALP2 outperforms several existing sequence-based predictors (CRYSTALP, OB-score, and SECRET). CRYSTALP2's accuracy, MCC, and AROC range between 69.3 and 77.5%, 0.39 and 0.55, and 0.72 and 0.79, respectively. Our predictions are similar in quality and are complementary to the predictions of the most recent ParCrys and XtalPred methods. Our results also suggest that, as work in protein crystallization continues (thereby enlarging the population of proteins with known crystallization propensities), the prediction quality of the CRYSTALP2 method should increase. The prediction model and the datasets used in this contribution can be downloaded from . CONCLUSION: CRYSTALP2 provides relatively accurate crystallization propensity predictions for a given protein chain that either outperform or complement the existing approaches. The proposed method can be used to support current efforts towards improving the success rate in obtaining diffraction-quality crystals. BioMed Central 2009-07-31 /pmc/articles/PMC2731098/ /pubmed/19646256 http://dx.doi.org/10.1186/1472-6807-9-50 Text en Copyright © 2009 Kurgan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Kurgan, Lukasz
Razib, Ali A
Aghakhani, Sara
Dick, Scott
Mizianty, Marcin
Jahandideh, Samad
CRYSTALP2: sequence-based protein crystallization propensity prediction
title CRYSTALP2: sequence-based protein crystallization propensity prediction
title_full CRYSTALP2: sequence-based protein crystallization propensity prediction
title_fullStr CRYSTALP2: sequence-based protein crystallization propensity prediction
title_full_unstemmed CRYSTALP2: sequence-based protein crystallization propensity prediction
title_short CRYSTALP2: sequence-based protein crystallization propensity prediction
title_sort crystalp2: sequence-based protein crystallization propensity prediction
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2731098/
https://www.ncbi.nlm.nih.gov/pubmed/19646256
http://dx.doi.org/10.1186/1472-6807-9-50
work_keys_str_mv AT kurganlukasz crystalp2sequencebasedproteincrystallizationpropensityprediction
AT razibalia crystalp2sequencebasedproteincrystallizationpropensityprediction
AT aghakhanisara crystalp2sequencebasedproteincrystallizationpropensityprediction
AT dickscott crystalp2sequencebasedproteincrystallizationpropensityprediction
AT miziantymarcin crystalp2sequencebasedproteincrystallizationpropensityprediction
AT jahandidehsamad crystalp2sequencebasedproteincrystallizationpropensityprediction