Cargando…

Accurate Prediction of Protein Structural Class

Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally selec...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xia, Xia-Yu, Ge, Meng, Wang, Zhi-Xin, Pan, Xian-Ming
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2012
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378576/ https://www.ncbi.nlm.nih.gov/pubmed/22723837 http://dx.doi.org/10.1371/journal.pone.0037653

_version_	1782236060672065536
author	Xia, Xia-Yu Ge, Meng Wang, Zhi-Xin Pan, Xian-Ming
author_facet	Xia, Xia-Yu Ge, Meng Wang, Zhi-Xin Pan, Xian-Ming
author_sort	Xia, Xia-Yu
collection	PubMed
description	Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods.
format	Online Article Text
id	pubmed-3378576
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-33785762012-06-21 Accurate Prediction of Protein Structural Class Xia, Xia-Yu Ge, Meng Wang, Zhi-Xin Pan, Xian-Ming PLoS One Research Article Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods. Public Library of Science 2012-06-19 /pmc/articles/PMC3378576/ /pubmed/22723837 http://dx.doi.org/10.1371/journal.pone.0037653 Text en Xia et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Xia, Xia-Yu Ge, Meng Wang, Zhi-Xin Pan, Xian-Ming Accurate Prediction of Protein Structural Class
title	Accurate Prediction of Protein Structural Class
title_full	Accurate Prediction of Protein Structural Class
title_fullStr	Accurate Prediction of Protein Structural Class
title_full_unstemmed	Accurate Prediction of Protein Structural Class
title_short	Accurate Prediction of Protein Structural Class
title_sort	accurate prediction of protein structural class
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378576/ https://www.ncbi.nlm.nih.gov/pubmed/22723837 http://dx.doi.org/10.1371/journal.pone.0037653
work_keys_str_mv	AT xiaxiayu accuratepredictionofproteinstructuralclass AT gemeng accuratepredictionofproteinstructuralclass AT wangzhixin accuratepredictionofproteinstructuralclass AT panxianming accuratepredictionofproteinstructuralclass

Accurate Prediction of Protein Structural Class

Ejemplares similares