Cargando…

A random forest classifier for protein–protein docking models

: Herein, we present the results of a machine learning approach we developed to single out correct 3D docking models of protein–protein complexes obtained by popular docking software. To this aim, we generated [Formula: see text] docking models for each of the 230 complexes in the protein–protein b...

Descripción completa

Detalles Bibliográficos
Autores principales:	Barradas-Bautista, Didier, Cao, Zhen, Vangone, Anna, Oliva, Romina, Cavallo, Luigi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710594/ https://www.ncbi.nlm.nih.gov/pubmed/36699405 http://dx.doi.org/10.1093/bioadv/vbab042

_version_	1784841400542560256
author	Barradas-Bautista, Didier Cao, Zhen Vangone, Anna Oliva, Romina Cavallo, Luigi
author_facet	Barradas-Bautista, Didier Cao, Zhen Vangone, Anna Oliva, Romina Cavallo, Luigi
author_sort	Barradas-Bautista, Didier
collection	PubMed
description	: Herein, we present the results of a machine learning approach we developed to single out correct 3D docking models of protein–protein complexes obtained by popular docking software. To this aim, we generated [Formula: see text] docking models for each of the 230 complexes in the protein–protein benchmark, version 5, using three different docking programs (HADDOCK, FTDock and ZDOCK), for a cumulative set of [Formula: see text] docking models. Three different machine learning approaches (Random Forest, Supported Vector Machine and Perceptron) were used to train classifiers with 158 different scoring functions (features). The Random Forest algorithm outperformed the other two algorithms and was selected for further optimization. Using a features selection algorithm, and optimizing the random forest hyperparameters, allowed us to train and validate a random forest classifier, named COnservation Driven Expert System (CoDES). Testing of CoDES on independent datasets, as well as results of its comparative performance with machine learning methods recently developed in the field for the scoring of docking decoys, confirm its state-of-the-art ability to discriminate correct from incorrect decoys both in terms of global parameters and in terms of decoys ranked at the top positions. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. SOFTWARE AND DATA AVAILABILITY STATEMENT: The docking models are available at https://doi.org/10.5281/zenodo.4012018. The programs underlying this article will be shared on request to the corresponding authors.
format	Online Article Text
id	pubmed-9710594
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-97105942023-01-24 A random forest classifier for protein–protein docking models Barradas-Bautista, Didier Cao, Zhen Vangone, Anna Oliva, Romina Cavallo, Luigi Bioinform Adv Original Paper : Herein, we present the results of a machine learning approach we developed to single out correct 3D docking models of protein–protein complexes obtained by popular docking software. To this aim, we generated [Formula: see text] docking models for each of the 230 complexes in the protein–protein benchmark, version 5, using three different docking programs (HADDOCK, FTDock and ZDOCK), for a cumulative set of [Formula: see text] docking models. Three different machine learning approaches (Random Forest, Supported Vector Machine and Perceptron) were used to train classifiers with 158 different scoring functions (features). The Random Forest algorithm outperformed the other two algorithms and was selected for further optimization. Using a features selection algorithm, and optimizing the random forest hyperparameters, allowed us to train and validate a random forest classifier, named COnservation Driven Expert System (CoDES). Testing of CoDES on independent datasets, as well as results of its comparative performance with machine learning methods recently developed in the field for the scoring of docking decoys, confirm its state-of-the-art ability to discriminate correct from incorrect decoys both in terms of global parameters and in terms of decoys ranked at the top positions. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. SOFTWARE AND DATA AVAILABILITY STATEMENT: The docking models are available at https://doi.org/10.5281/zenodo.4012018. The programs underlying this article will be shared on request to the corresponding authors. Oxford University Press 2021-12-10 /pmc/articles/PMC9710594/ /pubmed/36699405 http://dx.doi.org/10.1093/bioadv/vbab042 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Paper Barradas-Bautista, Didier Cao, Zhen Vangone, Anna Oliva, Romina Cavallo, Luigi A random forest classifier for protein–protein docking models
title	A random forest classifier for protein–protein docking models
title_full	A random forest classifier for protein–protein docking models
title_fullStr	A random forest classifier for protein–protein docking models
title_full_unstemmed	A random forest classifier for protein–protein docking models
title_short	A random forest classifier for protein–protein docking models
title_sort	random forest classifier for protein–protein docking models
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710594/ https://www.ncbi.nlm.nih.gov/pubmed/36699405 http://dx.doi.org/10.1093/bioadv/vbab042
work_keys_str_mv	AT barradasbautistadidier arandomforestclassifierforproteinproteindockingmodels AT caozhen arandomforestclassifierforproteinproteindockingmodels AT vangoneanna arandomforestclassifierforproteinproteindockingmodels AT olivaromina arandomforestclassifierforproteinproteindockingmodels AT cavalloluigi arandomforestclassifierforproteinproteindockingmodels AT barradasbautistadidier randomforestclassifierforproteinproteindockingmodels AT caozhen randomforestclassifierforproteinproteindockingmodels AT vangoneanna randomforestclassifierforproteinproteindockingmodels AT olivaromina randomforestclassifierforproteinproteindockingmodels AT cavalloluigi randomforestclassifierforproteinproteindockingmodels

A random forest classifier for protein–protein docking models

Ejemplares similares