Cargando…

Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs)

Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessm...

Descripción completa

Detalles Bibliográficos
Autores principales: Belfield, Samuel J., Cronin, Mark T.D., Enoch, Steven J., Firman, James W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10171609/
https://www.ncbi.nlm.nih.gov/pubmed/37163504
http://dx.doi.org/10.1371/journal.pone.0282924
_version_ 1785039457472217088
author Belfield, Samuel J.
Cronin, Mark T.D.
Enoch, Steven J.
Firman, James W.
author_facet Belfield, Samuel J.
Cronin, Mark T.D.
Enoch, Steven J.
Firman, James W.
author_sort Belfield, Samuel J.
collection PubMed
description Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animal-intensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable–appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for “best practice” aimed at mitigation of their influence. However, the scope of such exercises has remained limited to “classical” QSAR–that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.
format Online
Article
Text
id pubmed-10171609
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-101716092023-05-11 Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs) Belfield, Samuel J. Cronin, Mark T.D. Enoch, Steven J. Firman, James W. PLoS One Research Article Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animal-intensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable–appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for “best practice” aimed at mitigation of their influence. However, the scope of such exercises has remained limited to “classical” QSAR–that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated. Public Library of Science 2023-05-10 /pmc/articles/PMC10171609/ /pubmed/37163504 http://dx.doi.org/10.1371/journal.pone.0282924 Text en © 2023 Belfield et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Belfield, Samuel J.
Cronin, Mark T.D.
Enoch, Steven J.
Firman, James W.
Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs)
title Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs)
title_full Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs)
title_fullStr Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs)
title_full_unstemmed Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs)
title_short Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs)
title_sort guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (qsars)
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10171609/
https://www.ncbi.nlm.nih.gov/pubmed/37163504
http://dx.doi.org/10.1371/journal.pone.0282924
work_keys_str_mv AT belfieldsamuelj guidanceforgoodpracticeintheapplicationofmachinelearningindevelopmentoftoxicologicalquantitativestructureactivityrelationshipsqsars
AT croninmarktd guidanceforgoodpracticeintheapplicationofmachinelearningindevelopmentoftoxicologicalquantitativestructureactivityrelationshipsqsars
AT enochstevenj guidanceforgoodpracticeintheapplicationofmachinelearningindevelopmentoftoxicologicalquantitativestructureactivityrelationshipsqsars
AT firmanjamesw guidanceforgoodpracticeintheapplicationofmachinelearningindevelopmentoftoxicologicalquantitativestructureactivityrelationshipsqsars