Cargando…
A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5000613/ https://www.ncbi.nlm.nih.gov/pubmed/27472327 http://dx.doi.org/10.3390/ijms17081215 |
_version_ | 1782450321889427456 |
---|---|
author | Melo, Rita Fieldhouse, Robert Melo, André Correia, João D. G. Cordeiro, Maria Natália D. S. Gümüş, Zeynep H. Costa, Joaquim Bonvin, Alexandre M. J. J. Moreira, Irina S. |
author_facet | Melo, Rita Fieldhouse, Robert Melo, André Correia, João D. G. Cordeiro, Maria Natália D. S. Gümüş, Zeynep H. Costa, Joaquim Bonvin, Alexandre M. J. J. Moreira, Irina S. |
author_sort | Melo, Rita |
collection | PubMed |
description | Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set. |
format | Online Article Text |
id | pubmed-5000613 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-50006132016-09-01 A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces Melo, Rita Fieldhouse, Robert Melo, André Correia, João D. G. Cordeiro, Maria Natália D. S. Gümüş, Zeynep H. Costa, Joaquim Bonvin, Alexandre M. J. J. Moreira, Irina S. Int J Mol Sci Article Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set. MDPI 2016-07-27 /pmc/articles/PMC5000613/ /pubmed/27472327 http://dx.doi.org/10.3390/ijms17081215 Text en © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Melo, Rita Fieldhouse, Robert Melo, André Correia, João D. G. Cordeiro, Maria Natália D. S. Gümüş, Zeynep H. Costa, Joaquim Bonvin, Alexandre M. J. J. Moreira, Irina S. A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
title | A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
title_full | A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
title_fullStr | A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
title_full_unstemmed | A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
title_short | A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
title_sort | machine learning approach for hot-spot detection at protein-protein interfaces |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5000613/ https://www.ncbi.nlm.nih.gov/pubmed/27472327 http://dx.doi.org/10.3390/ijms17081215 |
work_keys_str_mv | AT melorita amachinelearningapproachforhotspotdetectionatproteinproteininterfaces AT fieldhouserobert amachinelearningapproachforhotspotdetectionatproteinproteininterfaces AT meloandre amachinelearningapproachforhotspotdetectionatproteinproteininterfaces AT correiajoaodg amachinelearningapproachforhotspotdetectionatproteinproteininterfaces AT cordeiromarianataliads amachinelearningapproachforhotspotdetectionatproteinproteininterfaces AT gumuszeyneph amachinelearningapproachforhotspotdetectionatproteinproteininterfaces AT costajoaquim amachinelearningapproachforhotspotdetectionatproteinproteininterfaces AT bonvinalexandremjj amachinelearningapproachforhotspotdetectionatproteinproteininterfaces AT moreirairinas amachinelearningapproachforhotspotdetectionatproteinproteininterfaces AT melorita machinelearningapproachforhotspotdetectionatproteinproteininterfaces AT fieldhouserobert machinelearningapproachforhotspotdetectionatproteinproteininterfaces AT meloandre machinelearningapproachforhotspotdetectionatproteinproteininterfaces AT correiajoaodg machinelearningapproachforhotspotdetectionatproteinproteininterfaces AT cordeiromarianataliads machinelearningapproachforhotspotdetectionatproteinproteininterfaces AT gumuszeyneph machinelearningapproachforhotspotdetectionatproteinproteininterfaces AT costajoaquim machinelearningapproachforhotspotdetectionatproteinproteininterfaces AT bonvinalexandremjj machinelearningapproachforhotspotdetectionatproteinproteininterfaces AT moreirairinas machinelearningapproachforhotspotdetectionatproteinproteininterfaces |