Cargando…
Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests
Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structu...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2621338/ https://www.ncbi.nlm.nih.gov/pubmed/19180183 http://dx.doi.org/10.1371/journal.pcbi.1000278 |
_version_ | 1782163394970779648 |
---|---|
author | Šikić, Mile Tomić, Sanja Vlahoviček, Kristian |
author_facet | Šikić, Mile Tomić, Sanja Vlahoviček, Kristian |
author_sort | Šikić, Mile |
collection | PubMed |
description | Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras–Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information. |
format | Text |
id | pubmed-2621338 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-26213382009-01-30 Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests Šikić, Mile Tomić, Sanja Vlahoviček, Kristian PLoS Comput Biol Research Article Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras–Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information. Public Library of Science 2009-01-30 /pmc/articles/PMC2621338/ /pubmed/19180183 http://dx.doi.org/10.1371/journal.pcbi.1000278 Text en Šikić et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Šikić, Mile Tomić, Sanja Vlahoviček, Kristian Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests |
title | Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests |
title_full | Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests |
title_fullStr | Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests |
title_full_unstemmed | Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests |
title_short | Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests |
title_sort | prediction of protein–protein interaction sites in sequences and 3d structures by random forests |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2621338/ https://www.ncbi.nlm.nih.gov/pubmed/19180183 http://dx.doi.org/10.1371/journal.pcbi.1000278 |
work_keys_str_mv | AT sikicmile predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests AT tomicsanja predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests AT vlahovicekkristian predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests |