Cargando…

A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces

Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model...

Descripción completa

Detalles Bibliográficos
Autores principales: Melo, Rita, Fieldhouse, Robert, Melo, André, Correia, João D. G., Cordeiro, Maria Natália D. S., Gümüş, Zeynep H., Costa, Joaquim, Bonvin, Alexandre M. J. J., Moreira, Irina S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5000613/
https://www.ncbi.nlm.nih.gov/pubmed/27472327
http://dx.doi.org/10.3390/ijms17081215
_version_ 1782450321889427456
author Melo, Rita
Fieldhouse, Robert
Melo, André
Correia, João D. G.
Cordeiro, Maria Natália D. S.
Gümüş, Zeynep H.
Costa, Joaquim
Bonvin, Alexandre M. J. J.
Moreira, Irina S.
author_facet Melo, Rita
Fieldhouse, Robert
Melo, André
Correia, João D. G.
Cordeiro, Maria Natália D. S.
Gümüş, Zeynep H.
Costa, Joaquim
Bonvin, Alexandre M. J. J.
Moreira, Irina S.
author_sort Melo, Rita
collection PubMed
description Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.
format Online
Article
Text
id pubmed-5000613
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-50006132016-09-01 A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces Melo, Rita Fieldhouse, Robert Melo, André Correia, João D. G. Cordeiro, Maria Natália D. S. Gümüş, Zeynep H. Costa, Joaquim Bonvin, Alexandre M. J. J. Moreira, Irina S. Int J Mol Sci Article Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set. MDPI 2016-07-27 /pmc/articles/PMC5000613/ /pubmed/27472327 http://dx.doi.org/10.3390/ijms17081215 Text en © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Melo, Rita
Fieldhouse, Robert
Melo, André
Correia, João D. G.
Cordeiro, Maria Natália D. S.
Gümüş, Zeynep H.
Costa, Joaquim
Bonvin, Alexandre M. J. J.
Moreira, Irina S.
A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
title A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
title_full A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
title_fullStr A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
title_full_unstemmed A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
title_short A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
title_sort machine learning approach for hot-spot detection at protein-protein interfaces
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5000613/
https://www.ncbi.nlm.nih.gov/pubmed/27472327
http://dx.doi.org/10.3390/ijms17081215
work_keys_str_mv AT melorita amachinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT fieldhouserobert amachinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT meloandre amachinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT correiajoaodg amachinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT cordeiromarianataliads amachinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT gumuszeyneph amachinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT costajoaquim amachinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT bonvinalexandremjj amachinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT moreirairinas amachinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT melorita machinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT fieldhouserobert machinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT meloandre machinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT correiajoaodg machinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT cordeiromarianataliads machinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT gumuszeyneph machinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT costajoaquim machinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT bonvinalexandremjj machinelearningapproachforhotspotdetectionatproteinproteininterfaces
AT moreirairinas machinelearningapproachforhotspotdetectionatproteinproteininterfaces