Cargando…

Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data

BACKGROUND: One goal of structural biology is to understand how a protein’s 3-dimensional conformation determines its capacity to interact with potential ligands. In the case of small chemical ligands, deconstructing a static protein-ligand complex into its constituent atom-atom interactions is typi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dias, Raquel, Kolaczkowski, Bryan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374557/ https://www.ncbi.nlm.nih.gov/pubmed/28361672 http://dx.doi.org/10.1186/s12859-017-1533-z

_version_	1782518911710789632
author	Dias, Raquel Kolaczkowski, Bryan
author_facet	Dias, Raquel Kolaczkowski, Bryan
author_sort	Dias, Raquel
collection	PubMed
description	BACKGROUND: One goal of structural biology is to understand how a protein’s 3-dimensional conformation determines its capacity to interact with potential ligands. In the case of small chemical ligands, deconstructing a static protein-ligand complex into its constituent atom-atom interactions is typically sufficient to rapidly predict ligand affinity with high accuracy (>70% correlation between predicted and experimentally-determined affinity), a fact that is exploited to support structure-based drug design. We recently found that protein-DNA/RNA affinity can also be predicted with high accuracy using extensions of existing techniques, but protein-protein affinity could not be predicted with >60% correlation, even when the protein-protein complex was available. METHODS: X-ray and NMR structures of protein-protein complexes, their associated binding affinities and experimental conditions were obtained from different binding affinity and structural databases. Statistical models were implemented using a generalized linear model framework, including the experimental conditions as new model features. We evaluated the potential for new features to improve affinity prediction models by calculating the Pearson correlation between predicted and experimental binding affinities on the training and test data after model fitting and after cross-validation. Differences in accuracy were assessed using two-sample t test and nonparametric Mann–Whitney U test. RESULTS: Here we evaluate a range of potential factors that may interfere with accurate protein-protein affinity prediction. We find that X-ray crystal resolution has the strongest single effect on protein-protein affinity prediction. Limiting our analyses to only high-resolution complexes (≤2.5 Å) increased the correlation between predicted and experimental affinity from 54 to 68% (p = 4.32x10(−3)). In addition, incorporating information on the experimental conditions under which affinities were measured (pH, temperature and binding assay) had significant effects on prediction accuracy. We also highlight a number of potential errors in large structure-affinity databases, which could affect both model training and accuracy assessment. CONCLUSIONS: The results suggest that the accuracy of statistical models for protein-protein affinity prediction may be limited by the information present in databases used to train new models. Improving our capacity to integrate large-scale structural and functional information may be required to substantively advance our understanding of the general principles by which a protein’s structure determines its function. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1533-z) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5374557
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-53745572017-03-31 Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data Dias, Raquel Kolaczkowski, Bryan BMC Bioinformatics Research BACKGROUND: One goal of structural biology is to understand how a protein’s 3-dimensional conformation determines its capacity to interact with potential ligands. In the case of small chemical ligands, deconstructing a static protein-ligand complex into its constituent atom-atom interactions is typically sufficient to rapidly predict ligand affinity with high accuracy (>70% correlation between predicted and experimentally-determined affinity), a fact that is exploited to support structure-based drug design. We recently found that protein-DNA/RNA affinity can also be predicted with high accuracy using extensions of existing techniques, but protein-protein affinity could not be predicted with >60% correlation, even when the protein-protein complex was available. METHODS: X-ray and NMR structures of protein-protein complexes, their associated binding affinities and experimental conditions were obtained from different binding affinity and structural databases. Statistical models were implemented using a generalized linear model framework, including the experimental conditions as new model features. We evaluated the potential for new features to improve affinity prediction models by calculating the Pearson correlation between predicted and experimental binding affinities on the training and test data after model fitting and after cross-validation. Differences in accuracy were assessed using two-sample t test and nonparametric Mann–Whitney U test. RESULTS: Here we evaluate a range of potential factors that may interfere with accurate protein-protein affinity prediction. We find that X-ray crystal resolution has the strongest single effect on protein-protein affinity prediction. Limiting our analyses to only high-resolution complexes (≤2.5 Å) increased the correlation between predicted and experimental affinity from 54 to 68% (p = 4.32x10(−3)). In addition, incorporating information on the experimental conditions under which affinities were measured (pH, temperature and binding assay) had significant effects on prediction accuracy. We also highlight a number of potential errors in large structure-affinity databases, which could affect both model training and accuracy assessment. CONCLUSIONS: The results suggest that the accuracy of statistical models for protein-protein affinity prediction may be limited by the information present in databases used to train new models. Improving our capacity to integrate large-scale structural and functional information may be required to substantively advance our understanding of the general principles by which a protein’s structure determines its function. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1533-z) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-23 /pmc/articles/PMC5374557/ /pubmed/28361672 http://dx.doi.org/10.1186/s12859-017-1533-z Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Dias, Raquel Kolaczkowski, Bryan Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data
title	Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data
title_full	Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data
title_fullStr	Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data
title_full_unstemmed	Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data
title_short	Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data
title_sort	improving the accuracy of high-throughput protein-protein affinity prediction may require better training data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374557/ https://www.ncbi.nlm.nih.gov/pubmed/28361672 http://dx.doi.org/10.1186/s12859-017-1533-z
work_keys_str_mv	AT diasraquel improvingtheaccuracyofhighthroughputproteinproteinaffinitypredictionmayrequirebettertrainingdata AT kolaczkowskibryan improvingtheaccuracyofhighthroughputproteinproteinaffinitypredictionmayrequirebettertrainingdata

Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data

Ejemplares similares