Cargando…

The influence of the inactives subset generation on the performance of machine learning methods

BACKGROUND: A growing popularity of machine learning methods application in virtual screening, in both classification and regression tasks, can be observed in the past few years. However, their effectiveness is strongly dependent on many different factors. RESULTS: In this study, the influence of th...

Descripción completa

Detalles Bibliográficos
Autores principales: Smusz, Sabina, Kurczab, Rafał, Bojarski, Andrzej J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3626618/
https://www.ncbi.nlm.nih.gov/pubmed/23561266
http://dx.doi.org/10.1186/1758-2946-5-17
_version_ 1782266215493795840
author Smusz, Sabina
Kurczab, Rafał
Bojarski, Andrzej J
author_facet Smusz, Sabina
Kurczab, Rafał
Bojarski, Andrzej J
author_sort Smusz, Sabina
collection PubMed
description BACKGROUND: A growing popularity of machine learning methods application in virtual screening, in both classification and regression tasks, can be observed in the past few years. However, their effectiveness is strongly dependent on many different factors. RESULTS: In this study, the influence of the way of forming the set of inactives on the classification process was examined: random and diverse selection from the ZINC database, MDDR database and libraries generated according to the DUD methodology. All learning methods were tested in two modes: using one test set, the same for each method of inactive molecules generation and using test sets with inactives prepared in an analogous way as for training. The experiments were carried out for 5 different protein targets, 3 fingerprints for molecules representation and 7 classification algorithms with varying parameters. It appeared that the process of inactive set formation had a substantial impact on the machine learning methods performance. CONCLUSIONS: The level of chemical space limitation determined the ability of tested classifiers to select potentially active molecules in virtual screening tasks, as for example DUDs (widely applied in docking experiments) did not provide proper selection of active molecules from databases with diverse structures. The study clearly showed that inactive compounds forming training set should be representative to the highest possible extent for libraries that undergo screening.
format Online
Article
Text
id pubmed-3626618
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36266182013-04-23 The influence of the inactives subset generation on the performance of machine learning methods Smusz, Sabina Kurczab, Rafał Bojarski, Andrzej J J Cheminform Research Article BACKGROUND: A growing popularity of machine learning methods application in virtual screening, in both classification and regression tasks, can be observed in the past few years. However, their effectiveness is strongly dependent on many different factors. RESULTS: In this study, the influence of the way of forming the set of inactives on the classification process was examined: random and diverse selection from the ZINC database, MDDR database and libraries generated according to the DUD methodology. All learning methods were tested in two modes: using one test set, the same for each method of inactive molecules generation and using test sets with inactives prepared in an analogous way as for training. The experiments were carried out for 5 different protein targets, 3 fingerprints for molecules representation and 7 classification algorithms with varying parameters. It appeared that the process of inactive set formation had a substantial impact on the machine learning methods performance. CONCLUSIONS: The level of chemical space limitation determined the ability of tested classifiers to select potentially active molecules in virtual screening tasks, as for example DUDs (widely applied in docking experiments) did not provide proper selection of active molecules from databases with diverse structures. The study clearly showed that inactive compounds forming training set should be representative to the highest possible extent for libraries that undergo screening. BioMed Central 2013-04-05 /pmc/articles/PMC3626618/ /pubmed/23561266 http://dx.doi.org/10.1186/1758-2946-5-17 Text en Copyright © 2013 Smusz et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Smusz, Sabina
Kurczab, Rafał
Bojarski, Andrzej J
The influence of the inactives subset generation on the performance of machine learning methods
title The influence of the inactives subset generation on the performance of machine learning methods
title_full The influence of the inactives subset generation on the performance of machine learning methods
title_fullStr The influence of the inactives subset generation on the performance of machine learning methods
title_full_unstemmed The influence of the inactives subset generation on the performance of machine learning methods
title_short The influence of the inactives subset generation on the performance of machine learning methods
title_sort influence of the inactives subset generation on the performance of machine learning methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3626618/
https://www.ncbi.nlm.nih.gov/pubmed/23561266
http://dx.doi.org/10.1186/1758-2946-5-17
work_keys_str_mv AT smuszsabina theinfluenceoftheinactivessubsetgenerationontheperformanceofmachinelearningmethods
AT kurczabrafał theinfluenceoftheinactivessubsetgenerationontheperformanceofmachinelearningmethods
AT bojarskiandrzejj theinfluenceoftheinactivessubsetgenerationontheperformanceofmachinelearningmethods
AT smuszsabina influenceoftheinactivessubsetgenerationontheperformanceofmachinelearningmethods
AT kurczabrafał influenceoftheinactivessubsetgenerationontheperformanceofmachinelearningmethods
AT bojarskiandrzejj influenceoftheinactivessubsetgenerationontheperformanceofmachinelearningmethods