Cargando…

Searching the protein structure database for ligand-binding site similarities using CPASS v.2

BACKGROUND: A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Compar...

Descripción completa

Detalles Bibliográficos
Autores principales: Powers, Robert, Copeland, Jennifer C, Stark, Jaime L, Caprez, Adam, Guru, Ashu, Swanson, David
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3057182/
https://www.ncbi.nlm.nih.gov/pubmed/21269480
http://dx.doi.org/10.1186/1756-0500-4-17
_version_ 1782200268804325376
author Powers, Robert
Copeland, Jennifer C
Stark, Jaime L
Caprez, Adam
Guru, Ashu
Swanson, David
author_facet Powers, Robert
Copeland, Jennifer C
Stark, Jaime L
Caprez, Adam
Guru, Ashu
Swanson, David
author_sort Powers, Robert
collection PubMed
description BACKGROUND: A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2) database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR) protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. FINDINGS: We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG) to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. CONCLUSIONS: CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores for false positives by ~30%, while leaving true positives unaffected. Importantly, receiver operating characteristics (ROC) curves demonstrate the high correlation between CPASS similarity scores and an accurate functional assignment. As indicated by distribution curves, scores ≥ 30% infer a functional similarity. Software URL: http://cpass.unl.edu.
format Text
id pubmed-3057182
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30571822011-03-17 Searching the protein structure database for ligand-binding site similarities using CPASS v.2 Powers, Robert Copeland, Jennifer C Stark, Jaime L Caprez, Adam Guru, Ashu Swanson, David BMC Res Notes Technical Note BACKGROUND: A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2) database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR) protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. FINDINGS: We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG) to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. CONCLUSIONS: CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores for false positives by ~30%, while leaving true positives unaffected. Importantly, receiver operating characteristics (ROC) curves demonstrate the high correlation between CPASS similarity scores and an accurate functional assignment. As indicated by distribution curves, scores ≥ 30% infer a functional similarity. Software URL: http://cpass.unl.edu. BioMed Central 2011-01-26 /pmc/articles/PMC3057182/ /pubmed/21269480 http://dx.doi.org/10.1186/1756-0500-4-17 Text en Copyright ©2011 Powers et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Powers, Robert
Copeland, Jennifer C
Stark, Jaime L
Caprez, Adam
Guru, Ashu
Swanson, David
Searching the protein structure database for ligand-binding site similarities using CPASS v.2
title Searching the protein structure database for ligand-binding site similarities using CPASS v.2
title_full Searching the protein structure database for ligand-binding site similarities using CPASS v.2
title_fullStr Searching the protein structure database for ligand-binding site similarities using CPASS v.2
title_full_unstemmed Searching the protein structure database for ligand-binding site similarities using CPASS v.2
title_short Searching the protein structure database for ligand-binding site similarities using CPASS v.2
title_sort searching the protein structure database for ligand-binding site similarities using cpass v.2
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3057182/
https://www.ncbi.nlm.nih.gov/pubmed/21269480
http://dx.doi.org/10.1186/1756-0500-4-17
work_keys_str_mv AT powersrobert searchingtheproteinstructuredatabaseforligandbindingsitesimilaritiesusingcpassv2
AT copelandjenniferc searchingtheproteinstructuredatabaseforligandbindingsitesimilaritiesusingcpassv2
AT starkjaimel searchingtheproteinstructuredatabaseforligandbindingsitesimilaritiesusingcpassv2
AT caprezadam searchingtheproteinstructuredatabaseforligandbindingsitesimilaritiesusingcpassv2
AT guruashu searchingtheproteinstructuredatabaseforligandbindingsitesimilaritiesusingcpassv2
AT swansondavid searchingtheproteinstructuredatabaseforligandbindingsitesimilaritiesusingcpassv2