Cargando…

Large-scale ligand-based predictive modelling using support vector machines

The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on o...

Descripción completa

Detalles Bibliográficos
Autores principales: Alvarsson, Jonathan, Lampa, Samuel, Schaal, Wesley, Andersson, Claes, Wikberg, Jarl E. S., Spjuth, Ola
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4980776/
https://www.ncbi.nlm.nih.gov/pubmed/27516811
http://dx.doi.org/10.1186/s13321-016-0151-5
_version_ 1782447513339428864
author Alvarsson, Jonathan
Lampa, Samuel
Schaal, Wesley
Andersson, Claes
Wikberg, Jarl E. S.
Spjuth, Ola
author_facet Alvarsson, Jonathan
Lampa, Samuel
Schaal, Wesley
Andersson, Claes
Wikberg, Jarl E. S.
Spjuth, Ola
author_sort Alvarsson, Jonathan
collection PubMed
description The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0151-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4980776
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-49807762016-08-12 Large-scale ligand-based predictive modelling using support vector machines Alvarsson, Jonathan Lampa, Samuel Schaal, Wesley Andersson, Claes Wikberg, Jarl E. S. Spjuth, Ola J Cheminform Research Article The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0151-5) contains supplementary material, which is available to authorized users. Springer International Publishing 2016-08-10 /pmc/articles/PMC4980776/ /pubmed/27516811 http://dx.doi.org/10.1186/s13321-016-0151-5 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Alvarsson, Jonathan
Lampa, Samuel
Schaal, Wesley
Andersson, Claes
Wikberg, Jarl E. S.
Spjuth, Ola
Large-scale ligand-based predictive modelling using support vector machines
title Large-scale ligand-based predictive modelling using support vector machines
title_full Large-scale ligand-based predictive modelling using support vector machines
title_fullStr Large-scale ligand-based predictive modelling using support vector machines
title_full_unstemmed Large-scale ligand-based predictive modelling using support vector machines
title_short Large-scale ligand-based predictive modelling using support vector machines
title_sort large-scale ligand-based predictive modelling using support vector machines
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4980776/
https://www.ncbi.nlm.nih.gov/pubmed/27516811
http://dx.doi.org/10.1186/s13321-016-0151-5
work_keys_str_mv AT alvarssonjonathan largescaleligandbasedpredictivemodellingusingsupportvectormachines
AT lampasamuel largescaleligandbasedpredictivemodellingusingsupportvectormachines
AT schaalwesley largescaleligandbasedpredictivemodellingusingsupportvectormachines
AT anderssonclaes largescaleligandbasedpredictivemodellingusingsupportvectormachines
AT wikbergjarles largescaleligandbasedpredictivemodellingusingsupportvectormachines
AT spjuthola largescaleligandbasedpredictivemodellingusingsupportvectormachines