Cargando…

Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests

Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structu...

Descripción completa

Detalles Bibliográficos
Autores principales: Šikić, Mile, Tomić, Sanja, Vlahoviček, Kristian
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2621338/
https://www.ncbi.nlm.nih.gov/pubmed/19180183
http://dx.doi.org/10.1371/journal.pcbi.1000278
_version_ 1782163394970779648
author Šikić, Mile
Tomić, Sanja
Vlahoviček, Kristian
author_facet Šikić, Mile
Tomić, Sanja
Vlahoviček, Kristian
author_sort Šikić, Mile
collection PubMed
description Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras–Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.
format Text
id pubmed-2621338
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-26213382009-01-30 Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests Šikić, Mile Tomić, Sanja Vlahoviček, Kristian PLoS Comput Biol Research Article Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras–Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information. Public Library of Science 2009-01-30 /pmc/articles/PMC2621338/ /pubmed/19180183 http://dx.doi.org/10.1371/journal.pcbi.1000278 Text en Šikić et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Šikić, Mile
Tomić, Sanja
Vlahoviček, Kristian
Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests
title Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests
title_full Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests
title_fullStr Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests
title_full_unstemmed Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests
title_short Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests
title_sort prediction of protein–protein interaction sites in sequences and 3d structures by random forests
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2621338/
https://www.ncbi.nlm.nih.gov/pubmed/19180183
http://dx.doi.org/10.1371/journal.pcbi.1000278
work_keys_str_mv AT sikicmile predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests
AT tomicsanja predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests
AT vlahovicekkristian predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests