Cargando…

Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines

Assessing the quality of a protein structure model is essential for protein structure prediction. Here, we developed a Support Vector Machine (SVM) method to predict the quality score (GDT-TS score) of a protein structure model from the features extracted from the sequence alignment used to generate...

Descripción completa

Detalles Bibliográficos
Autores principales: Deng, Xin, Li, Jilong, Cheng, Jianlin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4705550/
https://www.ncbi.nlm.nih.gov/pubmed/26752865
http://dx.doi.org/10.4172/jpb.S9-001
_version_ 1782409035788582912
author Deng, Xin
Li, Jilong
Cheng, Jianlin
author_facet Deng, Xin
Li, Jilong
Cheng, Jianlin
author_sort Deng, Xin
collection PubMed
description Assessing the quality of a protein structure model is essential for protein structure prediction. Here, we developed a Support Vector Machine (SVM) method to predict the quality score (GDT-TS score) of a protein structure model from the features extracted from the sequence alignment used to generate the model. We developed a Support Vector Machine (SVM) model quality assessment method, taking either a query-single-template pairwise alignment or a query-multitemplate alignment as input. For the pairwise alignment scheme, the input features fed into the SVM predictor include the normalized e-value of the given alignment, the percentage of identical residue pairs in the alignment, the percentage of residues of the query aligned with those of the template, and the sum of the BLOSUM scores of all aligned residues divided by the length of the aligned positions. Similarly, for the multiple-alignment scheme, the input features include the percentage of the residues of the target sequence aligned with those in one or more templates, the percentage of aligned residues of the target sequence that are the same as that of any one template, the average BLOSUM score of aligned residues and the average Gonnet160 score of aligned residues. A SVM regression predictor was trained on the training data to predict the GDT-TS scores of the models from the input features. The Root Mean Square Error (RMSE) and the Absolute Mean Error (ABS) between predicted and real GDT-TS scores were calculated to evaluate the performance. A five-fold cross validation was applied to select the best parameter values based on the average RMSE and ABS on the five folds. The RMSE and ABS of the optimized SVM predictor on the testing data were close to 0.1. The good performance of the SVM and sequence alignment based predictor indicates that integrating sequence alignment features with a SVM is effective for protein model quality assessment.
format Online
Article
Text
id pubmed-4705550
institution National Center for Biotechnology Information
language English
publishDate 2013
record_format MEDLINE/PubMed
spelling pubmed-47055502016-01-08 Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines Deng, Xin Li, Jilong Cheng, Jianlin J Proteomics Bioinform Article Assessing the quality of a protein structure model is essential for protein structure prediction. Here, we developed a Support Vector Machine (SVM) method to predict the quality score (GDT-TS score) of a protein structure model from the features extracted from the sequence alignment used to generate the model. We developed a Support Vector Machine (SVM) model quality assessment method, taking either a query-single-template pairwise alignment or a query-multitemplate alignment as input. For the pairwise alignment scheme, the input features fed into the SVM predictor include the normalized e-value of the given alignment, the percentage of identical residue pairs in the alignment, the percentage of residues of the query aligned with those of the template, and the sum of the BLOSUM scores of all aligned residues divided by the length of the aligned positions. Similarly, for the multiple-alignment scheme, the input features include the percentage of the residues of the target sequence aligned with those in one or more templates, the percentage of aligned residues of the target sequence that are the same as that of any one template, the average BLOSUM score of aligned residues and the average Gonnet160 score of aligned residues. A SVM regression predictor was trained on the training data to predict the GDT-TS scores of the models from the input features. The Root Mean Square Error (RMSE) and the Absolute Mean Error (ABS) between predicted and real GDT-TS scores were calculated to evaluate the performance. A five-fold cross validation was applied to select the best parameter values based on the average RMSE and ABS on the five folds. The RMSE and ABS of the optimized SVM predictor on the testing data were close to 0.1. The good performance of the SVM and sequence alignment based predictor indicates that integrating sequence alignment features with a SVM is effective for protein model quality assessment. 2013-11-04 2013-11-04 /pmc/articles/PMC4705550/ /pubmed/26752865 http://dx.doi.org/10.4172/jpb.S9-001 Text en http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Article
Deng, Xin
Li, Jilong
Cheng, Jianlin
Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines
title Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines
title_full Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines
title_fullStr Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines
title_full_unstemmed Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines
title_short Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines
title_sort predicting protein model quality from sequence alignments by support vector machines
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4705550/
https://www.ncbi.nlm.nih.gov/pubmed/26752865
http://dx.doi.org/10.4172/jpb.S9-001
work_keys_str_mv AT dengxin predictingproteinmodelqualityfromsequencealignmentsbysupportvectormachines
AT lijilong predictingproteinmodelqualityfromsequencealignmentsbysupportvectormachines
AT chengjianlin predictingproteinmodelqualityfromsequencealignmentsbysupportvectormachines