Cargando…

Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns

BACKGROUND: The unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method...

Descripción completa

Detalles Bibliográficos
Autores principales: Pires, Douglas EV, de Melo-Minardi, Raquel C, dos Santos, Marcos A, da Silveira, Carlos H, Santoro, Marcelo M, Meira, Wagner
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287581/
https://www.ncbi.nlm.nih.gov/pubmed/22369665
http://dx.doi.org/10.1186/1471-2164-12-S4-S12
_version_ 1782224696116248576
author Pires, Douglas EV
de Melo-Minardi, Raquel C
dos Santos, Marcos A
da Silveira, Carlos H
Santoro, Marcelo M
Meira, Wagner
author_facet Pires, Douglas EV
de Melo-Minardi, Raquel C
dos Santos, Marcos A
da Silveira, Carlos H
Santoro, Marcelo M
Meira, Wagner
author_sort Pires, Douglas EV
collection PubMed
description BACKGROUND: The unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method: Cutoff Scanning Matrix (CSM). CSM generates feature vectors that represent distance patterns between protein residues. These feature vectors are then used as evidence for classification. Singular value decomposition is used as a preprocessing step to reduce dimensionality and noise. The aspect of protein function considered in the present work is enzyme activity. A series of experiments was performed on datasets based on Enzyme Commission (EC) numbers and mechanistically different enzyme superfamilies as well as other datasets derived from SCOP release 1.75. RESULTS: CSM was able to achieve a precision of up to 99% after SVD preprocessing for a database derived from manually curated protein superfamilies and up to 95% for a dataset of the 950 most-populated EC numbers. Moreover, we conducted experiments to verify our ability to assign SCOP class, superfamily, family and fold to protein domains. An experiment using the whole set of domains found in last SCOP version yielded high levels of precision and recall (up to 95%). Finally, we compared our structural classification results with those in the literature to place this work into context. Our method was capable of significantly improving the recall of a previous study while preserving a compatible precision level. CONCLUSIONS: We showed that the patterns derived from CSMs could effectively be used to predict protein function and thus help with automatic function annotation. We also demonstrated that our method is effective in structural classification tasks. These facts reinforce the idea that the pattern of inter-residue distances is an important component of family structural signatures. Furthermore, singular value decomposition provided a consistent increase in precision and recall, which makes it an important preprocessing step when dealing with noisy data.
format Online
Article
Text
id pubmed-3287581
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32875812012-02-28 Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns Pires, Douglas EV de Melo-Minardi, Raquel C dos Santos, Marcos A da Silveira, Carlos H Santoro, Marcelo M Meira, Wagner BMC Genomics Proceedings BACKGROUND: The unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method: Cutoff Scanning Matrix (CSM). CSM generates feature vectors that represent distance patterns between protein residues. These feature vectors are then used as evidence for classification. Singular value decomposition is used as a preprocessing step to reduce dimensionality and noise. The aspect of protein function considered in the present work is enzyme activity. A series of experiments was performed on datasets based on Enzyme Commission (EC) numbers and mechanistically different enzyme superfamilies as well as other datasets derived from SCOP release 1.75. RESULTS: CSM was able to achieve a precision of up to 99% after SVD preprocessing for a database derived from manually curated protein superfamilies and up to 95% for a dataset of the 950 most-populated EC numbers. Moreover, we conducted experiments to verify our ability to assign SCOP class, superfamily, family and fold to protein domains. An experiment using the whole set of domains found in last SCOP version yielded high levels of precision and recall (up to 95%). Finally, we compared our structural classification results with those in the literature to place this work into context. Our method was capable of significantly improving the recall of a previous study while preserving a compatible precision level. CONCLUSIONS: We showed that the patterns derived from CSMs could effectively be used to predict protein function and thus help with automatic function annotation. We also demonstrated that our method is effective in structural classification tasks. These facts reinforce the idea that the pattern of inter-residue distances is an important component of family structural signatures. Furthermore, singular value decomposition provided a consistent increase in precision and recall, which makes it an important preprocessing step when dealing with noisy data. BioMed Central 2011-12-22 /pmc/articles/PMC3287581/ /pubmed/22369665 http://dx.doi.org/10.1186/1471-2164-12-S4-S12 Text en Copyright ©2011 Pires et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Pires, Douglas EV
de Melo-Minardi, Raquel C
dos Santos, Marcos A
da Silveira, Carlos H
Santoro, Marcelo M
Meira, Wagner
Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns
title Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns
title_full Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns
title_fullStr Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns
title_full_unstemmed Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns
title_short Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns
title_sort cutoff scanning matrix (csm): structural classification and function prediction by protein inter-residue distance patterns
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287581/
https://www.ncbi.nlm.nih.gov/pubmed/22369665
http://dx.doi.org/10.1186/1471-2164-12-S4-S12
work_keys_str_mv AT piresdouglasev cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns
AT demelominardiraquelc cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns
AT dossantosmarcosa cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns
AT dasilveiracarlosh cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns
AT santoromarcelom cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns
AT meirawagner cutoffscanningmatrixcsmstructuralclassificationandfunctionpredictionbyproteininterresiduedistancepatterns