Cargando…
A method for probabilistic mapping between protein structure and function taxonomies through cross training
BACKGROUND: Prediction of function of proteins on the basis of structure and vice versa is a partially solved problem, largely in the domain of biophysics and biochemistry. This underlies the need of computational and bioinformatics approach to solve the problem. Large and organized latent knowledge...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2573881/ https://www.ncbi.nlm.nih.gov/pubmed/18834528 http://dx.doi.org/10.1186/1472-6807-8-40 |
_version_ | 1782160282376732672 |
---|---|
author | Gupta, Kshitiz Sehgal, Vivek Levchenko, Andre |
author_facet | Gupta, Kshitiz Sehgal, Vivek Levchenko, Andre |
author_sort | Gupta, Kshitiz |
collection | PubMed |
description | BACKGROUND: Prediction of function of proteins on the basis of structure and vice versa is a partially solved problem, largely in the domain of biophysics and biochemistry. This underlies the need of computational and bioinformatics approach to solve the problem. Large and organized latent knowledge on protein classification exists in the form of independently created protein classification databases. By creating probabilistic maps between classes of structural classification databases (e.g. SCOP [1]) and classes of functional classification databases (e.g. PROSITE [2]), structure and function of proteins could be probabilistically related. RESULTS: We demonstrate that PROSITE and SCOP have significant semantic overlap, in spite of independent classification schemes. By training classifiers of SCOP using classes of PROSITE as attributes and vice versa, accuracy of Support Vector Machine classifiers for both SCOP and PROSITE was improved. Novel attributes, 2-D elastic profiles and Blocks were used to improve time complexity and accuracy. Many relationships were extracted between classes of SCOP and PROSITE using decision trees. CONCLUSION: We demonstrate that presented approach can discover new probabilistic relationships between classes of different taxonomies and render a more accurate classification. Extensive mappings between existing protein classification databases can be created to link the large amount of organized data. Probabilistic maps were created between classes of SCOP and PROSITE allowing predictions of structure using function, and vice versa. In our experiments, we also found that functions are indeed more strongly related to structure than are structure to functions. |
format | Text |
id | pubmed-2573881 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-25738812008-10-27 A method for probabilistic mapping between protein structure and function taxonomies through cross training Gupta, Kshitiz Sehgal, Vivek Levchenko, Andre BMC Struct Biol Research Article BACKGROUND: Prediction of function of proteins on the basis of structure and vice versa is a partially solved problem, largely in the domain of biophysics and biochemistry. This underlies the need of computational and bioinformatics approach to solve the problem. Large and organized latent knowledge on protein classification exists in the form of independently created protein classification databases. By creating probabilistic maps between classes of structural classification databases (e.g. SCOP [1]) and classes of functional classification databases (e.g. PROSITE [2]), structure and function of proteins could be probabilistically related. RESULTS: We demonstrate that PROSITE and SCOP have significant semantic overlap, in spite of independent classification schemes. By training classifiers of SCOP using classes of PROSITE as attributes and vice versa, accuracy of Support Vector Machine classifiers for both SCOP and PROSITE was improved. Novel attributes, 2-D elastic profiles and Blocks were used to improve time complexity and accuracy. Many relationships were extracted between classes of SCOP and PROSITE using decision trees. CONCLUSION: We demonstrate that presented approach can discover new probabilistic relationships between classes of different taxonomies and render a more accurate classification. Extensive mappings between existing protein classification databases can be created to link the large amount of organized data. Probabilistic maps were created between classes of SCOP and PROSITE allowing predictions of structure using function, and vice versa. In our experiments, we also found that functions are indeed more strongly related to structure than are structure to functions. BioMed Central 2008-10-03 /pmc/articles/PMC2573881/ /pubmed/18834528 http://dx.doi.org/10.1186/1472-6807-8-40 Text en Copyright © 2008 Gupta et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Gupta, Kshitiz Sehgal, Vivek Levchenko, Andre A method for probabilistic mapping between protein structure and function taxonomies through cross training |
title | A method for probabilistic mapping between protein structure and function taxonomies through cross training |
title_full | A method for probabilistic mapping between protein structure and function taxonomies through cross training |
title_fullStr | A method for probabilistic mapping between protein structure and function taxonomies through cross training |
title_full_unstemmed | A method for probabilistic mapping between protein structure and function taxonomies through cross training |
title_short | A method for probabilistic mapping between protein structure and function taxonomies through cross training |
title_sort | method for probabilistic mapping between protein structure and function taxonomies through cross training |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2573881/ https://www.ncbi.nlm.nih.gov/pubmed/18834528 http://dx.doi.org/10.1186/1472-6807-8-40 |
work_keys_str_mv | AT guptakshitiz amethodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining AT sehgalvivek amethodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining AT levchenkoandre amethodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining AT guptakshitiz methodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining AT sehgalvivek methodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining AT levchenkoandre methodforprobabilisticmappingbetweenproteinstructureandfunctiontaxonomiesthroughcrosstraining |