Cargando…
A PDB-wide, evolution-based assessment of protein–protein interfaces
BACKGROUND: Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein–protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequen...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4274722/ https://www.ncbi.nlm.nih.gov/pubmed/25326082 http://dx.doi.org/10.1186/s12900-014-0022-0 |
_version_ | 1782350026548183040 |
---|---|
author | Baskaran, Kumaran Duarte, Jose M Biyani, Nikhil Bliven, Spencer Capitani, Guido |
author_facet | Baskaran, Kumaran Duarte, Jose M Biyani, Nikhil Bliven, Spencer Capitani, Guido |
author_sort | Baskaran, Kumaran |
collection | PubMed |
description | BACKGROUND: Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein–protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequence conservation and geometric features. RESULTS: An automated computational pipeline was developed to run our Evolutionary Protein–Protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces. This allows the analysis of interface data on a PDB-wide scale. Two large benchmark datasets of biological interfaces and crystal contacts, each containing about 3000 entries, were automatically generated based on criteria thought to be strong indicators of interface type. The BioMany set of biological interfaces includes NMR dimers solved as crystal structures and interfaces that are preserved across diverse crystal forms, as catalogued by the Protein Common Interface Database (ProtCID) from Xu and Dunbrack. The second dataset, XtalMany, is derived from interfaces that would lead to infinite assemblies and are therefore crystal contacts. BioMany and XtalMany were used to benchmark the EPPIC approach. The performance of EPPIC was also compared to classifications from the Protein Interfaces, Surfaces, and Assemblies (PISA) program on a PDB-wide scale, finding that the two approaches give the same call in about 88% of PDB interfaces. By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB. Additionally, we developed a PyMOL plugin for direct download and easy visualization of EPPIC interfaces for any PDB entry. Both the datasets and the PyMOL plugin are available at http://www.eppic-web.org/ewui/#downloads. CONCLUSIONS: Our computational pipeline allows us to analyze protein–protein contacts and their sequence conservation across the entire PDB. Two new benchmark datasets are provided, which are over an order of magnitude larger than existing manually curated ones. These tools enable the comprehensive study of several aspects of protein–protein contacts in the PDB and represent a basis for future, even larger scale studies of protein–protein interactions. |
format | Online Article Text |
id | pubmed-4274722 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42747222014-12-24 A PDB-wide, evolution-based assessment of protein–protein interfaces Baskaran, Kumaran Duarte, Jose M Biyani, Nikhil Bliven, Spencer Capitani, Guido BMC Struct Biol Research Article BACKGROUND: Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein–protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequence conservation and geometric features. RESULTS: An automated computational pipeline was developed to run our Evolutionary Protein–Protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces. This allows the analysis of interface data on a PDB-wide scale. Two large benchmark datasets of biological interfaces and crystal contacts, each containing about 3000 entries, were automatically generated based on criteria thought to be strong indicators of interface type. The BioMany set of biological interfaces includes NMR dimers solved as crystal structures and interfaces that are preserved across diverse crystal forms, as catalogued by the Protein Common Interface Database (ProtCID) from Xu and Dunbrack. The second dataset, XtalMany, is derived from interfaces that would lead to infinite assemblies and are therefore crystal contacts. BioMany and XtalMany were used to benchmark the EPPIC approach. The performance of EPPIC was also compared to classifications from the Protein Interfaces, Surfaces, and Assemblies (PISA) program on a PDB-wide scale, finding that the two approaches give the same call in about 88% of PDB interfaces. By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB. Additionally, we developed a PyMOL plugin for direct download and easy visualization of EPPIC interfaces for any PDB entry. Both the datasets and the PyMOL plugin are available at http://www.eppic-web.org/ewui/#downloads. CONCLUSIONS: Our computational pipeline allows us to analyze protein–protein contacts and their sequence conservation across the entire PDB. Two new benchmark datasets are provided, which are over an order of magnitude larger than existing manually curated ones. These tools enable the comprehensive study of several aspects of protein–protein contacts in the PDB and represent a basis for future, even larger scale studies of protein–protein interactions. BioMed Central 2014-10-18 /pmc/articles/PMC4274722/ /pubmed/25326082 http://dx.doi.org/10.1186/s12900-014-0022-0 Text en Copyright © 2014 Baskaran et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Baskaran, Kumaran Duarte, Jose M Biyani, Nikhil Bliven, Spencer Capitani, Guido A PDB-wide, evolution-based assessment of protein–protein interfaces |
title | A PDB-wide, evolution-based assessment of protein–protein interfaces |
title_full | A PDB-wide, evolution-based assessment of protein–protein interfaces |
title_fullStr | A PDB-wide, evolution-based assessment of protein–protein interfaces |
title_full_unstemmed | A PDB-wide, evolution-based assessment of protein–protein interfaces |
title_short | A PDB-wide, evolution-based assessment of protein–protein interfaces |
title_sort | pdb-wide, evolution-based assessment of protein–protein interfaces |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4274722/ https://www.ncbi.nlm.nih.gov/pubmed/25326082 http://dx.doi.org/10.1186/s12900-014-0022-0 |
work_keys_str_mv | AT baskarankumaran apdbwideevolutionbasedassessmentofproteinproteininterfaces AT duartejosem apdbwideevolutionbasedassessmentofproteinproteininterfaces AT biyaninikhil apdbwideevolutionbasedassessmentofproteinproteininterfaces AT blivenspencer apdbwideevolutionbasedassessmentofproteinproteininterfaces AT capitaniguido apdbwideevolutionbasedassessmentofproteinproteininterfaces AT baskarankumaran pdbwideevolutionbasedassessmentofproteinproteininterfaces AT duartejosem pdbwideevolutionbasedassessmentofproteinproteininterfaces AT biyaninikhil pdbwideevolutionbasedassessmentofproteinproteininterfaces AT blivenspencer pdbwideevolutionbasedassessmentofproteinproteininterfaces AT capitaniguido pdbwideevolutionbasedassessmentofproteinproteininterfaces |