Cargando…

Methodology of aiQSAR: a group-specific approach to QSAR modelling

BACKGROUND: Several QSAR methodology developments have shown promise in recent years. These include the consensus approach to generate the final prediction of a model, utilizing new, advanced machine learning algorithms and streamlining, standardization and automation of various QSAR steps. One appr...

Descripción completa

Detalles Bibliográficos
Autores principales: Vukovic, Kristijan, Gadaleta, Domenico, Benfenati, Emilio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6446381/
https://www.ncbi.nlm.nih.gov/pubmed/30945010
http://dx.doi.org/10.1186/s13321-019-0350-y
_version_ 1783408354033401856
author Vukovic, Kristijan
Gadaleta, Domenico
Benfenati, Emilio
author_facet Vukovic, Kristijan
Gadaleta, Domenico
Benfenati, Emilio
author_sort Vukovic, Kristijan
collection PubMed
description BACKGROUND: Several QSAR methodology developments have shown promise in recent years. These include the consensus approach to generate the final prediction of a model, utilizing new, advanced machine learning algorithms and streamlining, standardization and automation of various QSAR steps. One approach that seems under-explored is at-the-runtime generation of local models specific to individual compounds. This approach was quite likely limited by the computational requirements, but with current increases in processing power and the widespread availability of cluster-computing infrastructure, this limitation is no longer that severe. RESULTS: We propose a new QSAR methodology: aiQSAR, whose aim is to generate endpoint predictions directly from the input dataset by building an array of local models generated at-the-runtime and specific for each compound in the dataset. The local group of each compound is selected on the basis of fingerprint similarities and the final prediction is calculated by integrating the results of a number of autonomous mathematical models. The method is applicable to regression, binary classification and multi-class classification and was tested on one dataset for each endpoint type: bioconcentration factor (BCF) for regression, Ames test for binary classification and Environmental Protection Agency (EPA) acute rat oral toxicity ranking for multi-class classification. As part of this method, the applicability domain of each prediction is assessed through the applicability domain measure, calculated on the basis of the fingerprint similarities in each local group of compounds. CONCLUSIONS: We outline the methodology for a new QSAR-based predictive tool whose advantages are automation, group-specific approach to modelling and simplicity of execution. Our aim now will be to develop this method into a stand-alone software tool. We hope that eventual adoption of our tool would make QSAR modelling more accessible and transparent. Our methodology could be used as an initial modelling step, to predict new compounds by simply loading the training dataset as an input. Predictions could then be further evaluated and refined either by other tools or through optimization of aiQSAR parameters. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-019-0350-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6446381
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-64463812019-04-15 Methodology of aiQSAR: a group-specific approach to QSAR modelling Vukovic, Kristijan Gadaleta, Domenico Benfenati, Emilio J Cheminform Methodology BACKGROUND: Several QSAR methodology developments have shown promise in recent years. These include the consensus approach to generate the final prediction of a model, utilizing new, advanced machine learning algorithms and streamlining, standardization and automation of various QSAR steps. One approach that seems under-explored is at-the-runtime generation of local models specific to individual compounds. This approach was quite likely limited by the computational requirements, but with current increases in processing power and the widespread availability of cluster-computing infrastructure, this limitation is no longer that severe. RESULTS: We propose a new QSAR methodology: aiQSAR, whose aim is to generate endpoint predictions directly from the input dataset by building an array of local models generated at-the-runtime and specific for each compound in the dataset. The local group of each compound is selected on the basis of fingerprint similarities and the final prediction is calculated by integrating the results of a number of autonomous mathematical models. The method is applicable to regression, binary classification and multi-class classification and was tested on one dataset for each endpoint type: bioconcentration factor (BCF) for regression, Ames test for binary classification and Environmental Protection Agency (EPA) acute rat oral toxicity ranking for multi-class classification. As part of this method, the applicability domain of each prediction is assessed through the applicability domain measure, calculated on the basis of the fingerprint similarities in each local group of compounds. CONCLUSIONS: We outline the methodology for a new QSAR-based predictive tool whose advantages are automation, group-specific approach to modelling and simplicity of execution. Our aim now will be to develop this method into a stand-alone software tool. We hope that eventual adoption of our tool would make QSAR modelling more accessible and transparent. Our methodology could be used as an initial modelling step, to predict new compounds by simply loading the training dataset as an input. Predictions could then be further evaluated and refined either by other tools or through optimization of aiQSAR parameters. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-019-0350-y) contains supplementary material, which is available to authorized users. Springer International Publishing 2019-04-03 /pmc/articles/PMC6446381/ /pubmed/30945010 http://dx.doi.org/10.1186/s13321-019-0350-y Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Vukovic, Kristijan
Gadaleta, Domenico
Benfenati, Emilio
Methodology of aiQSAR: a group-specific approach to QSAR modelling
title Methodology of aiQSAR: a group-specific approach to QSAR modelling
title_full Methodology of aiQSAR: a group-specific approach to QSAR modelling
title_fullStr Methodology of aiQSAR: a group-specific approach to QSAR modelling
title_full_unstemmed Methodology of aiQSAR: a group-specific approach to QSAR modelling
title_short Methodology of aiQSAR: a group-specific approach to QSAR modelling
title_sort methodology of aiqsar: a group-specific approach to qsar modelling
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6446381/
https://www.ncbi.nlm.nih.gov/pubmed/30945010
http://dx.doi.org/10.1186/s13321-019-0350-y
work_keys_str_mv AT vukovickristijan methodologyofaiqsaragroupspecificapproachtoqsarmodelling
AT gadaletadomenico methodologyofaiqsaragroupspecificapproachtoqsarmodelling
AT benfenatiemilio methodologyofaiqsaragroupspecificapproachtoqsarmodelling