Cargando…
Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns
BACKGROUND: The unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287581/ https://www.ncbi.nlm.nih.gov/pubmed/22369665 http://dx.doi.org/10.1186/1471-2164-12-S4-S12 |
_version_ | 1782224696116248576 |
---|---|
author | Pires, Douglas EV de Melo-Minardi, Raquel C dos Santos, Marcos A da Silveira, Carlos H Santoro, Marcelo M Meira, Wagner |
author_facet | Pires, Douglas EV de Melo-Minardi, Raquel C dos Santos, Marcos A da Silveira, Carlos H Santoro, Marcelo M Meira, Wagner |
author_sort | Pires, Douglas EV |
collection | PubMed |
description | BACKGROUND: The unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method: Cutoff Scanning Matrix (CSM). CSM generates feature vectors that represent distance patterns between protein residues. These feature vectors are then used as evidence for classification. Singular value decomposition is used as a preprocessing step to reduce dimensionality and noise. The aspect of protein function considered in the present work is enzyme activity. A series of experiments was performed on datasets based on Enzyme Commission (EC) numbers and mechanistically different enzyme superfamilies as well as other datasets derived from SCOP release 1.75. RESULTS: CSM was able to achieve a precision of up to 99% after SVD preprocessing for a database derived from manually curated protein superfamilies and up to 95% for a dataset of the 950 most-populated EC numbers. Moreover, we conducted experiments to verify our ability to assign SCOP class, superfamily, family and fold to protein domains. An experiment using the whole set of domains found in last SCOP version yielded high levels of precision and recall (up to 95%). Finally, we compared our structural classification results with those in the literature to place this work into context. Our method was capable of significantly improving the recall of a previous study while preserving a compatible precision level. CONCLUSIONS: We showed that the patterns derived from CSMs could effectively be used to predict protein function and thus help with automatic function annotation. We also demonstrated that our method is effective in structural classification tasks. These facts reinforce the idea that the pattern of inter-residue distances is an important component of family structural signatures. Furthermore, singular value decomposition provided a consistent increase in precision and recall, which makes it an important preprocessing step when dealing with noisy data. |
format | Online Article Text |
id | pubmed-3287581 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32875812012-02-28 Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns Pires, Douglas EV de Melo-Minardi, Raquel C dos Santos, Marcos A da Silveira, Carlos H Santoro, Marcelo M Meira, Wagner BMC Genomics Proceedings BACKGROUND: The unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method: Cutoff Scanning Matrix (CSM). CSM generates feature vectors that represent distance patterns between protein residues. These feature vectors are then used as evidence for classification. Singular value decomposition is used as a preprocessing step to reduce dimensionality and noise. The aspect of protein function considered in the present work is enzyme activity. A series of experiments was performed on datasets based on Enzyme Commission (EC) numbers and mechanistically different enzyme superfamilies as well as other datasets derived from SCOP release 1.75. RESULTS: CSM was able to achieve a precision of up to 99% after SVD preprocessing for a database derived from manually curated protein superfamilies and up to 95% for a dataset of the 950 most-populated EC numbers. Moreover, we conducted experiments to verify our ability to assign SCOP class, superfamily, family and fold to protein domains. An experiment using the whole set of domains found in last SCOP version yielded high levels of precision and recall (up to 95%). Finally, we compared our structural classification results with those in the literature to place this work into context. Our method was capable of significantly improving the recall of a previous study while preserving a compatible precision level. CONCLUSIONS: We showed that the patterns derived from CSMs could effectively be used to predict protein function and thus help with automatic function annotation. We also demonstrated that our method is effective in structural classification tasks. These facts reinforce the idea that the pattern of inter-residue distances is an important component of family structural signatures. Furthermore, singular value decomposition provided a consistent increase in precision and recall, which makes it an important preprocessing step when dealing with noisy data. BioMed Central 2011-12-22 /pmc/articles/PMC3287581/ /pubmed/22369665 http://dx.doi.org/10.1186/1471-2164-12-S4-S12 Text en Copyright ©2011 Pires et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Pires, Douglas EV de Melo-Minardi, Raquel C dos Santos, Marcos A da Silveira, Carlos H Santoro, Marcelo M Meira, Wagner Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns |
title | Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns |
title_full | Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns |
title_fullStr | Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns |
title_full_unstemmed | Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns |
title_short | Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns |
title_sort | cutoff scanning matrix (csm): structural classification and function prediction by protein inter-residue distance patterns |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287581/ https://www.ncbi.nlm.nih.gov/pubmed/22369665 http://dx.doi.org/10.1186/1471-2164-12-S4-S12 |
work_keys_str_mv | AT piresdouglasev cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns AT demelominardiraquelc cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns AT dossantosmarcosa cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns AT dasilveiracarlosh cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns AT santoromarcelom cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns AT meirawagner cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns |