Cargando…

Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem

The CompTox Chemistry Dashboard (ToxCast) contains one of the largest public databases on Zebrafish (Danio rerio) developmental toxicity. The data consists of 19 toxicological endpoints on unique 1018 compounds measured in relatively low concentration ranges. The endpoints are related to development...

Descripción completa

Detalles Bibliográficos
Autores principales: Lovrić, Mario, Malev, Olga, Klobučar, Göran, Kern, Roman, Liu, Jay J., Lučić, Bono
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7998177/
https://www.ncbi.nlm.nih.gov/pubmed/33803931
http://dx.doi.org/10.3390/molecules26061617
_version_ 1783670491684274176
author Lovrić, Mario
Malev, Olga
Klobučar, Göran
Kern, Roman
Liu, Jay J.
Lučić, Bono
author_facet Lovrić, Mario
Malev, Olga
Klobučar, Göran
Kern, Roman
Liu, Jay J.
Lučić, Bono
author_sort Lovrić, Mario
collection PubMed
description The CompTox Chemistry Dashboard (ToxCast) contains one of the largest public databases on Zebrafish (Danio rerio) developmental toxicity. The data consists of 19 toxicological endpoints on unique 1018 compounds measured in relatively low concentration ranges. The endpoints are related to developmental effects occurring in dechorionated zebrafish embryos for 120 hours post fertilization and monitored via gross malformations and mortality. We report the predictive capability of 209 quantitative structure–activity relationship (QSAR) models developed by machine learning methods using penalization techniques and diverse model quality metrics to cope with the imbalanced endpoints. All these QSAR models were generated to test how the imbalanced classification (toxic or non-toxic) endpoints could be predicted regardless which of three algorithms is used: logistic regression, multi-layer perceptron, or random forests. Additionally, QSAR toxicity models are developed starting from sets of classical molecular descriptors, structural fingerprints and their combinations. Only 8 out of 209 models passed the 0.20 Matthew’s correlation coefficient value defined a priori as a threshold for acceptable model quality on the test sets. The best models were obtained for endpoints mortality (MORT), ActivityScore and JAW (deformation). The low predictability of the QSAR model developed from the zebrafish embryotoxicity data in the database is mainly due to a higher sensitivity of 19 measurements of endpoints carried out on dechorionated embryos at low concentrations.
format Online
Article
Text
id pubmed-7998177
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79981772021-03-28 Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem Lovrić, Mario Malev, Olga Klobučar, Göran Kern, Roman Liu, Jay J. Lučić, Bono Molecules Article The CompTox Chemistry Dashboard (ToxCast) contains one of the largest public databases on Zebrafish (Danio rerio) developmental toxicity. The data consists of 19 toxicological endpoints on unique 1018 compounds measured in relatively low concentration ranges. The endpoints are related to developmental effects occurring in dechorionated zebrafish embryos for 120 hours post fertilization and monitored via gross malformations and mortality. We report the predictive capability of 209 quantitative structure–activity relationship (QSAR) models developed by machine learning methods using penalization techniques and diverse model quality metrics to cope with the imbalanced endpoints. All these QSAR models were generated to test how the imbalanced classification (toxic or non-toxic) endpoints could be predicted regardless which of three algorithms is used: logistic regression, multi-layer perceptron, or random forests. Additionally, QSAR toxicity models are developed starting from sets of classical molecular descriptors, structural fingerprints and their combinations. Only 8 out of 209 models passed the 0.20 Matthew’s correlation coefficient value defined a priori as a threshold for acceptable model quality on the test sets. The best models were obtained for endpoints mortality (MORT), ActivityScore and JAW (deformation). The low predictability of the QSAR model developed from the zebrafish embryotoxicity data in the database is mainly due to a higher sensitivity of 19 measurements of endpoints carried out on dechorionated embryos at low concentrations. MDPI 2021-03-15 /pmc/articles/PMC7998177/ /pubmed/33803931 http://dx.doi.org/10.3390/molecules26061617 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Lovrić, Mario
Malev, Olga
Klobučar, Göran
Kern, Roman
Liu, Jay J.
Lučić, Bono
Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem
title Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem
title_full Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem
title_fullStr Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem
title_full_unstemmed Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem
title_short Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem
title_sort predictive capability of qsar models based on the comptox zebrafish embryo assays: an imbalanced classification problem
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7998177/
https://www.ncbi.nlm.nih.gov/pubmed/33803931
http://dx.doi.org/10.3390/molecules26061617
work_keys_str_mv AT lovricmario predictivecapabilityofqsarmodelsbasedonthecomptoxzebrafishembryoassaysanimbalancedclassificationproblem
AT malevolga predictivecapabilityofqsarmodelsbasedonthecomptoxzebrafishembryoassaysanimbalancedclassificationproblem
AT klobucargoran predictivecapabilityofqsarmodelsbasedonthecomptoxzebrafishembryoassaysanimbalancedclassificationproblem
AT kernroman predictivecapabilityofqsarmodelsbasedonthecomptoxzebrafishembryoassaysanimbalancedclassificationproblem
AT liujayj predictivecapabilityofqsarmodelsbasedonthecomptoxzebrafishembryoassaysanimbalancedclassificationproblem
AT lucicbono predictivecapabilityofqsarmodelsbasedonthecomptoxzebrafishembryoassaysanimbalancedclassificationproblem