Cargando…
Large-scale ligand-based predictive modelling using support vector machines
The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on o...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4980776/ https://www.ncbi.nlm.nih.gov/pubmed/27516811 http://dx.doi.org/10.1186/s13321-016-0151-5 |
_version_ | 1782447513339428864 |
---|---|
author | Alvarsson, Jonathan Lampa, Samuel Schaal, Wesley Andersson, Claes Wikberg, Jarl E. S. Spjuth, Ola |
author_facet | Alvarsson, Jonathan Lampa, Samuel Schaal, Wesley Andersson, Claes Wikberg, Jarl E. S. Spjuth, Ola |
author_sort | Alvarsson, Jonathan |
collection | PubMed |
description | The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0151-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4980776 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-49807762016-08-12 Large-scale ligand-based predictive modelling using support vector machines Alvarsson, Jonathan Lampa, Samuel Schaal, Wesley Andersson, Claes Wikberg, Jarl E. S. Spjuth, Ola J Cheminform Research Article The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0151-5) contains supplementary material, which is available to authorized users. Springer International Publishing 2016-08-10 /pmc/articles/PMC4980776/ /pubmed/27516811 http://dx.doi.org/10.1186/s13321-016-0151-5 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Alvarsson, Jonathan Lampa, Samuel Schaal, Wesley Andersson, Claes Wikberg, Jarl E. S. Spjuth, Ola Large-scale ligand-based predictive modelling using support vector machines |
title | Large-scale ligand-based predictive modelling using support vector machines |
title_full | Large-scale ligand-based predictive modelling using support vector machines |
title_fullStr | Large-scale ligand-based predictive modelling using support vector machines |
title_full_unstemmed | Large-scale ligand-based predictive modelling using support vector machines |
title_short | Large-scale ligand-based predictive modelling using support vector machines |
title_sort | large-scale ligand-based predictive modelling using support vector machines |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4980776/ https://www.ncbi.nlm.nih.gov/pubmed/27516811 http://dx.doi.org/10.1186/s13321-016-0151-5 |
work_keys_str_mv | AT alvarssonjonathan largescaleligandbasedpredictivemodellingusingsupportvectormachines AT lampasamuel largescaleligandbasedpredictivemodellingusingsupportvectormachines AT schaalwesley largescaleligandbasedpredictivemodellingusingsupportvectormachines AT anderssonclaes largescaleligandbasedpredictivemodellingusingsupportvectormachines AT wikbergjarles largescaleligandbasedpredictivemodellingusingsupportvectormachines AT spjuthola largescaleligandbasedpredictivemodellingusingsupportvectormachines |