Cargando…

A method for probabilistic mapping between protein structure and function taxonomies through cross training

BACKGROUND: Prediction of function of proteins on the basis of structure and vice versa is a partially solved problem, largely in the domain of biophysics and biochemistry. This underlies the need of computational and bioinformatics approach to solve the problem. Large and organized latent knowledge...

Descripción completa

Detalles Bibliográficos
Autores principales: Gupta, Kshitiz, Sehgal, Vivek, Levchenko, Andre
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2573881/
https://www.ncbi.nlm.nih.gov/pubmed/18834528
http://dx.doi.org/10.1186/1472-6807-8-40
_version_ 1782160282376732672
author Gupta, Kshitiz
Sehgal, Vivek
Levchenko, Andre
author_facet Gupta, Kshitiz
Sehgal, Vivek
Levchenko, Andre
author_sort Gupta, Kshitiz
collection PubMed
description BACKGROUND: Prediction of function of proteins on the basis of structure and vice versa is a partially solved problem, largely in the domain of biophysics and biochemistry. This underlies the need of computational and bioinformatics approach to solve the problem. Large and organized latent knowledge on protein classification exists in the form of independently created protein classification databases. By creating probabilistic maps between classes of structural classification databases (e.g. SCOP [1]) and classes of functional classification databases (e.g. PROSITE [2]), structure and function of proteins could be probabilistically related. RESULTS: We demonstrate that PROSITE and SCOP have significant semantic overlap, in spite of independent classification schemes. By training classifiers of SCOP using classes of PROSITE as attributes and vice versa, accuracy of Support Vector Machine classifiers for both SCOP and PROSITE was improved. Novel attributes, 2-D elastic profiles and Blocks were used to improve time complexity and accuracy. Many relationships were extracted between classes of SCOP and PROSITE using decision trees. CONCLUSION: We demonstrate that presented approach can discover new probabilistic relationships between classes of different taxonomies and render a more accurate classification. Extensive mappings between existing protein classification databases can be created to link the large amount of organized data. Probabilistic maps were created between classes of SCOP and PROSITE allowing predictions of structure using function, and vice versa. In our experiments, we also found that functions are indeed more strongly related to structure than are structure to functions.
format Text
id pubmed-2573881
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25738812008-10-27 A method for probabilistic mapping between protein structure and function taxonomies through cross training Gupta, Kshitiz Sehgal, Vivek Levchenko, Andre BMC Struct Biol Research Article BACKGROUND: Prediction of function of proteins on the basis of structure and vice versa is a partially solved problem, largely in the domain of biophysics and biochemistry. This underlies the need of computational and bioinformatics approach to solve the problem. Large and organized latent knowledge on protein classification exists in the form of independently created protein classification databases. By creating probabilistic maps between classes of structural classification databases (e.g. SCOP [1]) and classes of functional classification databases (e.g. PROSITE [2]), structure and function of proteins could be probabilistically related. RESULTS: We demonstrate that PROSITE and SCOP have significant semantic overlap, in spite of independent classification schemes. By training classifiers of SCOP using classes of PROSITE as attributes and vice versa, accuracy of Support Vector Machine classifiers for both SCOP and PROSITE was improved. Novel attributes, 2-D elastic profiles and Blocks were used to improve time complexity and accuracy. Many relationships were extracted between classes of SCOP and PROSITE using decision trees. CONCLUSION: We demonstrate that presented approach can discover new probabilistic relationships between classes of different taxonomies and render a more accurate classification. Extensive mappings between existing protein classification databases can be created to link the large amount of organized data. Probabilistic maps were created between classes of SCOP and PROSITE allowing predictions of structure using function, and vice versa. In our experiments, we also found that functions are indeed more strongly related to structure than are structure to functions. BioMed Central 2008-10-03 /pmc/articles/PMC2573881/ /pubmed/18834528 http://dx.doi.org/10.1186/1472-6807-8-40 Text en Copyright © 2008 Gupta et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Gupta, Kshitiz
Sehgal, Vivek
Levchenko, Andre
A method for probabilistic mapping between protein structure and function taxonomies through cross training
title A method for probabilistic mapping between protein structure and function taxonomies through cross training
title_full A method for probabilistic mapping between protein structure and function taxonomies through cross training
title_fullStr A method for probabilistic mapping between protein structure and function taxonomies through cross training
title_full_unstemmed A method for probabilistic mapping between protein structure and function taxonomies through cross training
title_short A method for probabilistic mapping between protein structure and function taxonomies through cross training
title_sort method for probabilistic mapping between protein structure and function taxonomies through cross training
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2573881/
https://www.ncbi.nlm.nih.gov/pubmed/18834528
http://dx.doi.org/10.1186/1472-6807-8-40
work_keys_str_mv AT guptakshitiz amethodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining
AT sehgalvivek amethodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining
AT levchenkoandre amethodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining
AT guptakshitiz methodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining
AT sehgalvivek methodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining
AT levchenkoandre methodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining