Cargando…

A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison

Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding si...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pfeiffenberger, Erik, Chaleil, Raphael A.G., Moal, Iain H., Bates, Paul A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2017
Materias:	Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5396268/ https://www.ncbi.nlm.nih.gov/pubmed/27935158 http://dx.doi.org/10.1002/prot.25218

_version_	1783230031335522304
author	Pfeiffenberger, Erik Chaleil, Raphael A.G. Moal, Iain H. Bates, Paul A.
author_facet	Pfeiffenberger, Erik Chaleil, Raphael A.G. Moal, Iain H. Bates, Paul A.
author_sort	Pfeiffenberger, Erik
collection	PubMed
description	Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near‐native from incorrect clusters. The results show that our approach is able to identify clusters containing near‐native protein–protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528–543. © 2016 Wiley Periodicals, Inc.
format	Online Article Text
id	pubmed-5396268
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-53962682017-04-25 A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison Pfeiffenberger, Erik Chaleil, Raphael A.G. Moal, Iain H. Bates, Paul A. Proteins Articles Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near‐native from incorrect clusters. The results show that our approach is able to identify clusters containing near‐native protein–protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528–543. © 2016 Wiley Periodicals, Inc. John Wiley and Sons Inc. 2017-01-20 2017-03 /pmc/articles/PMC5396268/ /pubmed/27935158 http://dx.doi.org/10.1002/prot.25218 Text en © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. This is an open access article under the terms of the Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Articles Pfeiffenberger, Erik Chaleil, Raphael A.G. Moal, Iain H. Bates, Paul A. A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison
title	A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison
title_full	A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison
title_fullStr	A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison
title_full_unstemmed	A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison
title_short	A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison
title_sort	machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison
topic	Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5396268/ https://www.ncbi.nlm.nih.gov/pubmed/27935158 http://dx.doi.org/10.1002/prot.25218
work_keys_str_mv	AT pfeiffenbergererik amachinelearningapproachforrankingclustersofdockedproteinproteincomplexesbypairwiseclustercomparison AT chaleilraphaelag amachinelearningapproachforrankingclustersofdockedproteinproteincomplexesbypairwiseclustercomparison AT moaliainh amachinelearningapproachforrankingclustersofdockedproteinproteincomplexesbypairwiseclustercomparison AT batespaula amachinelearningapproachforrankingclustersofdockedproteinproteincomplexesbypairwiseclustercomparison AT pfeiffenbergererik machinelearningapproachforrankingclustersofdockedproteinproteincomplexesbypairwiseclustercomparison AT chaleilraphaelag machinelearningapproachforrankingclustersofdockedproteinproteincomplexesbypairwiseclustercomparison AT moaliainh machinelearningapproachforrankingclustersofdockedproteinproteincomplexesbypairwiseclustercomparison AT batespaula machinelearningapproachforrankingclustersofdockedproteinproteincomplexesbypairwiseclustercomparison

A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison

Ejemplares similares