Cargando…

QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances

The Aryl hydrocarbon receptor (AhR) plays important roles in many normal and pathological physiological processes, including endocrine homeostasis, foetal development, cell cycle regulation, cellular oxidation/antioxidation, immune regulation, metabolism of endogenous and exogenous substances, and c...

Descripción completa

Detalles Bibliográficos
Autores principales: Klimenko, Kyrylo, Rosenberg, Sine A., Dybdahl, Marianne, Wedebye, Eva B., Nikolov, Nikolai G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417725/
https://www.ncbi.nlm.nih.gov/pubmed/30870500
http://dx.doi.org/10.1371/journal.pone.0213848
_version_ 1783403614021091328
author Klimenko, Kyrylo
Rosenberg, Sine A.
Dybdahl, Marianne
Wedebye, Eva B.
Nikolov, Nikolai G.
author_facet Klimenko, Kyrylo
Rosenberg, Sine A.
Dybdahl, Marianne
Wedebye, Eva B.
Nikolov, Nikolai G.
author_sort Klimenko, Kyrylo
collection PubMed
description The Aryl hydrocarbon receptor (AhR) plays important roles in many normal and pathological physiological processes, including endocrine homeostasis, foetal development, cell cycle regulation, cellular oxidation/antioxidation, immune regulation, metabolism of endogenous and exogenous substances, and carcinogenesis. An experimental data set for human in vitro AhR activation comprising 324,858 substances, of which 1,982 were confirmed actives, was used to test an in-house-developed approach to rationally select Quantitative Structure-Activity Relationship (QSAR) training set substances from an unbalanced data set. In the first iteration, active and inactive substances were selected by random to make QSAR models. Then, more inactive substances were added to the training set in two further iterations based on incorrect or out-of-domain predictions to produce larger models. The resulting ‘rational’ model, comprising 832 actives and four times as many inactives, i.e. 3,328, was compared to a model with a training set of same size and proportion of inactives chosen entirely by random. Both models underwent robust cross-validation and external validation showing good statistical performance, with the rational model having external validation sensitivity of 85.1% and specificity of 97.1%, compared to the random model with sensitivity 89.1% and specificity 91.3%. Furthermore, we integrated the training sets for both models with the 93 external validation test set actives and 372 randomly selected inactives to make two final models. They also underwent external validations for specificity and cross-validations, which confirmed that good predictivity was maintained. All developed models were applied to predict 80,086 EU REACH substances. The rational and random final models had 63.1% and 56.9% coverage of the REACH set, respectively, and predicted 1,256 and 3,214 substances as actives. The final models as well as predictions for AhR activation for 650,000 substances will be published in the Danish (Q)SAR Database and can, for example, be used for priority setting, in read-across predictions and in weight-of-evidence assessments of chemicals.
format Online
Article
Text
id pubmed-6417725
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-64177252019-04-01 QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances Klimenko, Kyrylo Rosenberg, Sine A. Dybdahl, Marianne Wedebye, Eva B. Nikolov, Nikolai G. PLoS One Research Article The Aryl hydrocarbon receptor (AhR) plays important roles in many normal and pathological physiological processes, including endocrine homeostasis, foetal development, cell cycle regulation, cellular oxidation/antioxidation, immune regulation, metabolism of endogenous and exogenous substances, and carcinogenesis. An experimental data set for human in vitro AhR activation comprising 324,858 substances, of which 1,982 were confirmed actives, was used to test an in-house-developed approach to rationally select Quantitative Structure-Activity Relationship (QSAR) training set substances from an unbalanced data set. In the first iteration, active and inactive substances were selected by random to make QSAR models. Then, more inactive substances were added to the training set in two further iterations based on incorrect or out-of-domain predictions to produce larger models. The resulting ‘rational’ model, comprising 832 actives and four times as many inactives, i.e. 3,328, was compared to a model with a training set of same size and proportion of inactives chosen entirely by random. Both models underwent robust cross-validation and external validation showing good statistical performance, with the rational model having external validation sensitivity of 85.1% and specificity of 97.1%, compared to the random model with sensitivity 89.1% and specificity 91.3%. Furthermore, we integrated the training sets for both models with the 93 external validation test set actives and 372 randomly selected inactives to make two final models. They also underwent external validations for specificity and cross-validations, which confirmed that good predictivity was maintained. All developed models were applied to predict 80,086 EU REACH substances. The rational and random final models had 63.1% and 56.9% coverage of the REACH set, respectively, and predicted 1,256 and 3,214 substances as actives. The final models as well as predictions for AhR activation for 650,000 substances will be published in the Danish (Q)SAR Database and can, for example, be used for priority setting, in read-across predictions and in weight-of-evidence assessments of chemicals. Public Library of Science 2019-03-14 /pmc/articles/PMC6417725/ /pubmed/30870500 http://dx.doi.org/10.1371/journal.pone.0213848 Text en © 2019 Klimenko et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Klimenko, Kyrylo
Rosenberg, Sine A.
Dybdahl, Marianne
Wedebye, Eva B.
Nikolov, Nikolai G.
QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances
title QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances
title_full QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances
title_fullStr QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances
title_full_unstemmed QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances
title_short QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances
title_sort qsar modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 reach pre-registered and/or registered substances
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417725/
https://www.ncbi.nlm.nih.gov/pubmed/30870500
http://dx.doi.org/10.1371/journal.pone.0213848
work_keys_str_mv AT klimenkokyrylo qsarmodellingofalargeimbalancedarylhydrocarbonactivationdatasetbyrationalandrandomsamplingandscreeningof80086reachpreregisteredandorregisteredsubstances
AT rosenbergsinea qsarmodellingofalargeimbalancedarylhydrocarbonactivationdatasetbyrationalandrandomsamplingandscreeningof80086reachpreregisteredandorregisteredsubstances
AT dybdahlmarianne qsarmodellingofalargeimbalancedarylhydrocarbonactivationdatasetbyrationalandrandomsamplingandscreeningof80086reachpreregisteredandorregisteredsubstances
AT wedebyeevab qsarmodellingofalargeimbalancedarylhydrocarbonactivationdatasetbyrationalandrandomsamplingandscreeningof80086reachpreregisteredandorregisteredsubstances
AT nikolovnikolaig qsarmodellingofalargeimbalancedarylhydrocarbonactivationdatasetbyrationalandrandomsamplingandscreeningof80086reachpreregisteredandorregisteredsubstances