Cargando…

Structure based function prediction of proteins using fragment library frequency vectors

The function of the protein is primarily dictated by its structure. Therefore it is far more logical to find the functional clues of the protein in its overall 3-dimensional fold or its global structure. In this paper, we have developed a novel Support Vector Machines (SVM) based prediction model fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Yadav, Akshay, Jayaraman, Valadi Krishnamoorthy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Biomedical Informatics 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488839/
https://www.ncbi.nlm.nih.gov/pubmed/23144557
http://dx.doi.org/10.6026/97320630008953
_version_ 1782248686372257792
author Yadav, Akshay
Jayaraman, Valadi Krishnamoorthy
author_facet Yadav, Akshay
Jayaraman, Valadi Krishnamoorthy
author_sort Yadav, Akshay
collection PubMed
description The function of the protein is primarily dictated by its structure. Therefore it is far more logical to find the functional clues of the protein in its overall 3-dimensional fold or its global structure. In this paper, we have developed a novel Support Vector Machines (SVM) based prediction model for functional classification and prediction of proteins using features extracted from its global structure based on fragment libraries. Fragment libraries have been previously used for abintio modelling of proteins and protein structure comparisons. The query protein structure is broken down into a collection of short contiguous backbone fragments and this collection is discretized using a library of fragments. The input feature vector is frequency vector that counts the number of each library fragment in the collection of fragments by all-to-all fragment comparisons. SVM models were trained and optimised for obtaining the best 10-fold Cross validation accuracy for classification. As an example, this method was applied for prediction and classification of Cell Adhesion molecules (CAMs). Thirty-four different fragment libraries with sizes ranging from 4 to 400 and fragment lengths ranging from 4 to 12 were used for obtaining the best prediction model. The best 10-fold CV accuracy of 95.25% was obtained for library of 400 fragments of length 10. An accuracy of 87.5% was obtained on an unseen test dataset consisting of 20 CAMs and 20 NonCAMs. This shows that protein structure can be accurately and uniquely described using 400 representative fragments of length 10.
format Online
Article
Text
id pubmed-3488839
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Biomedical Informatics
record_format MEDLINE/PubMed
spelling pubmed-34888392012-11-09 Structure based function prediction of proteins using fragment library frequency vectors Yadav, Akshay Jayaraman, Valadi Krishnamoorthy Bioinformation Prediction Model The function of the protein is primarily dictated by its structure. Therefore it is far more logical to find the functional clues of the protein in its overall 3-dimensional fold or its global structure. In this paper, we have developed a novel Support Vector Machines (SVM) based prediction model for functional classification and prediction of proteins using features extracted from its global structure based on fragment libraries. Fragment libraries have been previously used for abintio modelling of proteins and protein structure comparisons. The query protein structure is broken down into a collection of short contiguous backbone fragments and this collection is discretized using a library of fragments. The input feature vector is frequency vector that counts the number of each library fragment in the collection of fragments by all-to-all fragment comparisons. SVM models were trained and optimised for obtaining the best 10-fold Cross validation accuracy for classification. As an example, this method was applied for prediction and classification of Cell Adhesion molecules (CAMs). Thirty-four different fragment libraries with sizes ranging from 4 to 400 and fragment lengths ranging from 4 to 12 were used for obtaining the best prediction model. The best 10-fold CV accuracy of 95.25% was obtained for library of 400 fragments of length 10. An accuracy of 87.5% was obtained on an unseen test dataset consisting of 20 CAMs and 20 NonCAMs. This shows that protein structure can be accurately and uniquely described using 400 representative fragments of length 10. Biomedical Informatics 2012-10-01 /pmc/articles/PMC3488839/ /pubmed/23144557 http://dx.doi.org/10.6026/97320630008953 Text en © 2012 Biomedical Informatics This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.
spellingShingle Prediction Model
Yadav, Akshay
Jayaraman, Valadi Krishnamoorthy
Structure based function prediction of proteins using fragment library frequency vectors
title Structure based function prediction of proteins using fragment library frequency vectors
title_full Structure based function prediction of proteins using fragment library frequency vectors
title_fullStr Structure based function prediction of proteins using fragment library frequency vectors
title_full_unstemmed Structure based function prediction of proteins using fragment library frequency vectors
title_short Structure based function prediction of proteins using fragment library frequency vectors
title_sort structure based function prediction of proteins using fragment library frequency vectors
topic Prediction Model
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488839/
https://www.ncbi.nlm.nih.gov/pubmed/23144557
http://dx.doi.org/10.6026/97320630008953
work_keys_str_mv AT yadavakshay structurebasedfunctionpredictionofproteinsusingfragmentlibraryfrequencyvectors
AT jayaramanvaladikrishnamoorthy structurebasedfunctionpredictionofproteinsusingfragmentlibraryfrequencyvectors