Cargando…

The Relevance of Goodness‐of‐fit, Robustness and Prediction Validation Categories of OECD‐QSAR Principles with Respect to Sample Size and Model Type

We investigated the relevance of the validation principles on the Quantitative Structure Activity Relationship models issued by Organization for Economic and Co‐operation and Development. We checked the goodness‐of‐fit, robustness and predictivity categories in linear and nonlinear models using benc...

Descripción completa

Detalles Bibliográficos
Autores principales: Király, Péter, Kiss, Ramóna, Kovács, Dániel, Ballaj, Amine, Tóth, Gergely
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787734/
https://www.ncbi.nlm.nih.gov/pubmed/35773201
http://dx.doi.org/10.1002/minf.202200072
_version_ 1784858583435837440
author Király, Péter
Kiss, Ramóna
Kovács, Dániel
Ballaj, Amine
Tóth, Gergely
author_facet Király, Péter
Kiss, Ramóna
Kovács, Dániel
Ballaj, Amine
Tóth, Gergely
author_sort Király, Péter
collection PubMed
description We investigated the relevance of the validation principles on the Quantitative Structure Activity Relationship models issued by Organization for Economic and Co‐operation and Development. We checked the goodness‐of‐fit, robustness and predictivity categories in linear and nonlinear models using benchmark datasets. Most of our conclusions are drawn using the sample size dependence of the different validation parameters. We found that the goodness‐of‐fit parameters misleadingly overestimate the models on small samples. In the case of neural network and support vector models, the feasibility of the goodness‐of‐fit parameters often might be questioned. We propose to use the simplest y‐scrambling method to estimate chance correlation. We found that the leave‐one‐out and leave‐many‐out cross‐validation parameters can be rescaled to each other in all models and the computationally feasible method should be chosen depending on the model type. We assessed the interdependence of the validation parameters by calculating their rank correlations. Goodness of fit and robustness correlate quite well over a sample size for linear models and one of the approaches might be redundant. In the rank correlation between internal and external validation parameters, we found that the assignment of good and bad modellable data to the training or the test causes negative correlations.
format Online
Article
Text
id pubmed-9787734
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-97877342022-12-28 The Relevance of Goodness‐of‐fit, Robustness and Prediction Validation Categories of OECD‐QSAR Principles with Respect to Sample Size and Model Type Király, Péter Kiss, Ramóna Kovács, Dániel Ballaj, Amine Tóth, Gergely Mol Inform Research Articles We investigated the relevance of the validation principles on the Quantitative Structure Activity Relationship models issued by Organization for Economic and Co‐operation and Development. We checked the goodness‐of‐fit, robustness and predictivity categories in linear and nonlinear models using benchmark datasets. Most of our conclusions are drawn using the sample size dependence of the different validation parameters. We found that the goodness‐of‐fit parameters misleadingly overestimate the models on small samples. In the case of neural network and support vector models, the feasibility of the goodness‐of‐fit parameters often might be questioned. We propose to use the simplest y‐scrambling method to estimate chance correlation. We found that the leave‐one‐out and leave‐many‐out cross‐validation parameters can be rescaled to each other in all models and the computationally feasible method should be chosen depending on the model type. We assessed the interdependence of the validation parameters by calculating their rank correlations. Goodness of fit and robustness correlate quite well over a sample size for linear models and one of the approaches might be redundant. In the rank correlation between internal and external validation parameters, we found that the assignment of good and bad modellable data to the training or the test causes negative correlations. John Wiley and Sons Inc. 2022-07-25 2022-11 /pmc/articles/PMC9787734/ /pubmed/35773201 http://dx.doi.org/10.1002/minf.202200072 Text en © 2022 The Authors. Molecular Informatics published by Wiley-VCH GmbH https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Király, Péter
Kiss, Ramóna
Kovács, Dániel
Ballaj, Amine
Tóth, Gergely
The Relevance of Goodness‐of‐fit, Robustness and Prediction Validation Categories of OECD‐QSAR Principles with Respect to Sample Size and Model Type
title The Relevance of Goodness‐of‐fit, Robustness and Prediction Validation Categories of OECD‐QSAR Principles with Respect to Sample Size and Model Type
title_full The Relevance of Goodness‐of‐fit, Robustness and Prediction Validation Categories of OECD‐QSAR Principles with Respect to Sample Size and Model Type
title_fullStr The Relevance of Goodness‐of‐fit, Robustness and Prediction Validation Categories of OECD‐QSAR Principles with Respect to Sample Size and Model Type
title_full_unstemmed The Relevance of Goodness‐of‐fit, Robustness and Prediction Validation Categories of OECD‐QSAR Principles with Respect to Sample Size and Model Type
title_short The Relevance of Goodness‐of‐fit, Robustness and Prediction Validation Categories of OECD‐QSAR Principles with Respect to Sample Size and Model Type
title_sort relevance of goodness‐of‐fit, robustness and prediction validation categories of oecd‐qsar principles with respect to sample size and model type
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787734/
https://www.ncbi.nlm.nih.gov/pubmed/35773201
http://dx.doi.org/10.1002/minf.202200072
work_keys_str_mv AT kiralypeter therelevanceofgoodnessoffitrobustnessandpredictionvalidationcategoriesofoecdqsarprincipleswithrespecttosamplesizeandmodeltype
AT kissramona therelevanceofgoodnessoffitrobustnessandpredictionvalidationcategoriesofoecdqsarprincipleswithrespecttosamplesizeandmodeltype
AT kovacsdaniel therelevanceofgoodnessoffitrobustnessandpredictionvalidationcategoriesofoecdqsarprincipleswithrespecttosamplesizeandmodeltype
AT ballajamine therelevanceofgoodnessoffitrobustnessandpredictionvalidationcategoriesofoecdqsarprincipleswithrespecttosamplesizeandmodeltype
AT tothgergely therelevanceofgoodnessoffitrobustnessandpredictionvalidationcategoriesofoecdqsarprincipleswithrespecttosamplesizeandmodeltype
AT kiralypeter relevanceofgoodnessoffitrobustnessandpredictionvalidationcategoriesofoecdqsarprincipleswithrespecttosamplesizeandmodeltype
AT kissramona relevanceofgoodnessoffitrobustnessandpredictionvalidationcategoriesofoecdqsarprincipleswithrespecttosamplesizeandmodeltype
AT kovacsdaniel relevanceofgoodnessoffitrobustnessandpredictionvalidationcategoriesofoecdqsarprincipleswithrespecttosamplesizeandmodeltype
AT ballajamine relevanceofgoodnessoffitrobustnessandpredictionvalidationcategoriesofoecdqsarprincipleswithrespecttosamplesizeandmodeltype
AT tothgergely relevanceofgoodnessoffitrobustnessandpredictionvalidationcategoriesofoecdqsarprincipleswithrespecttosamplesizeandmodeltype