Cargando…

PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection

X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purific...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Huilin, Wang, Mingjun, Tan, Hao, Li, Yuan, Zhang, Ziding, Song, Jiangning
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4141844/
https://www.ncbi.nlm.nih.gov/pubmed/25148528
http://dx.doi.org/10.1371/journal.pone.0105902
_version_ 1782331704819580928
author Wang, Huilin
Wang, Mingjun
Tan, Hao
Li, Yuan
Zhang, Ziding
Song, Jiangning
author_facet Wang, Huilin
Wang, Mingjun
Tan, Hao
Li, Yuan
Zhang, Ziding
Song, Jiangning
author_sort Wang, Huilin
collection PubMed
description X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.
format Online
Article
Text
id pubmed-4141844
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41418442014-08-25 PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection Wang, Huilin Wang, Mingjun Tan, Hao Li, Yuan Zhang, Ziding Song, Jiangning PLoS One Research Article X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys. Public Library of Science 2014-08-22 /pmc/articles/PMC4141844/ /pubmed/25148528 http://dx.doi.org/10.1371/journal.pone.0105902 Text en © 2014 Wang et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wang, Huilin
Wang, Mingjun
Tan, Hao
Li, Yuan
Zhang, Ziding
Song, Jiangning
PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection
title PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection
title_full PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection
title_fullStr PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection
title_full_unstemmed PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection
title_short PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection
title_sort predppcrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4141844/
https://www.ncbi.nlm.nih.gov/pubmed/25148528
http://dx.doi.org/10.1371/journal.pone.0105902
work_keys_str_mv AT wanghuilin predppcrysaccuratepredictionofsequencecloningproteinproductionpurificationandcrystallizationpropensityfromproteinsequencesusingmultistepheterogeneousfeaturefusionandselection
AT wangmingjun predppcrysaccuratepredictionofsequencecloningproteinproductionpurificationandcrystallizationpropensityfromproteinsequencesusingmultistepheterogeneousfeaturefusionandselection
AT tanhao predppcrysaccuratepredictionofsequencecloningproteinproductionpurificationandcrystallizationpropensityfromproteinsequencesusingmultistepheterogeneousfeaturefusionandselection
AT liyuan predppcrysaccuratepredictionofsequencecloningproteinproductionpurificationandcrystallizationpropensityfromproteinsequencesusingmultistepheterogeneousfeaturefusionandselection
AT zhangziding predppcrysaccuratepredictionofsequencecloningproteinproductionpurificationandcrystallizationpropensityfromproteinsequencesusingmultistepheterogeneousfeaturefusionandselection
AT songjiangning predppcrysaccuratepredictionofsequencecloningproteinproductionpurificationandcrystallizationpropensityfromproteinsequencesusingmultistepheterogeneousfeaturefusionandselection