Cargando…
Accurate Prediction of Protein Structural Class
Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally selec...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378576/ https://www.ncbi.nlm.nih.gov/pubmed/22723837 http://dx.doi.org/10.1371/journal.pone.0037653 |
_version_ | 1782236060672065536 |
---|---|
author | Xia, Xia-Yu Ge, Meng Wang, Zhi-Xin Pan, Xian-Ming |
author_facet | Xia, Xia-Yu Ge, Meng Wang, Zhi-Xin Pan, Xian-Ming |
author_sort | Xia, Xia-Yu |
collection | PubMed |
description | Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods. |
format | Online Article Text |
id | pubmed-3378576 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-33785762012-06-21 Accurate Prediction of Protein Structural Class Xia, Xia-Yu Ge, Meng Wang, Zhi-Xin Pan, Xian-Ming PLoS One Research Article Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods. Public Library of Science 2012-06-19 /pmc/articles/PMC3378576/ /pubmed/22723837 http://dx.doi.org/10.1371/journal.pone.0037653 Text en Xia et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Xia, Xia-Yu Ge, Meng Wang, Zhi-Xin Pan, Xian-Ming Accurate Prediction of Protein Structural Class |
title | Accurate Prediction of Protein Structural Class |
title_full | Accurate Prediction of Protein Structural Class |
title_fullStr | Accurate Prediction of Protein Structural Class |
title_full_unstemmed | Accurate Prediction of Protein Structural Class |
title_short | Accurate Prediction of Protein Structural Class |
title_sort | accurate prediction of protein structural class |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378576/ https://www.ncbi.nlm.nih.gov/pubmed/22723837 http://dx.doi.org/10.1371/journal.pone.0037653 |
work_keys_str_mv | AT xiaxiayu accuratepredictionofproteinstructuralclass AT gemeng accuratepredictionofproteinstructuralclass AT wangzhixin accuratepredictionofproteinstructuralclass AT panxianming accuratepredictionofproteinstructuralclass |