Cargando…

Accurate Prediction of Protein Structural Class

Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally selec...

Descripción completa

Detalles Bibliográficos
Autores principales: Xia, Xia-Yu, Ge, Meng, Wang, Zhi-Xin, Pan, Xian-Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378576/
https://www.ncbi.nlm.nih.gov/pubmed/22723837
http://dx.doi.org/10.1371/journal.pone.0037653
_version_ 1782236060672065536
author Xia, Xia-Yu
Ge, Meng
Wang, Zhi-Xin
Pan, Xian-Ming
author_facet Xia, Xia-Yu
Ge, Meng
Wang, Zhi-Xin
Pan, Xian-Ming
author_sort Xia, Xia-Yu
collection PubMed
description Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods.
format Online
Article
Text
id pubmed-3378576
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-33785762012-06-21 Accurate Prediction of Protein Structural Class Xia, Xia-Yu Ge, Meng Wang, Zhi-Xin Pan, Xian-Ming PLoS One Research Article Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods. Public Library of Science 2012-06-19 /pmc/articles/PMC3378576/ /pubmed/22723837 http://dx.doi.org/10.1371/journal.pone.0037653 Text en Xia et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Xia, Xia-Yu
Ge, Meng
Wang, Zhi-Xin
Pan, Xian-Ming
Accurate Prediction of Protein Structural Class
title Accurate Prediction of Protein Structural Class
title_full Accurate Prediction of Protein Structural Class
title_fullStr Accurate Prediction of Protein Structural Class
title_full_unstemmed Accurate Prediction of Protein Structural Class
title_short Accurate Prediction of Protein Structural Class
title_sort accurate prediction of protein structural class
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378576/
https://www.ncbi.nlm.nih.gov/pubmed/22723837
http://dx.doi.org/10.1371/journal.pone.0037653
work_keys_str_mv AT xiaxiayu accuratepredictionofproteinstructuralclass
AT gemeng accuratepredictionofproteinstructuralclass
AT wangzhixin accuratepredictionofproteinstructuralclass
AT panxianming accuratepredictionofproteinstructuralclass