Cargando…

OPERA models for predicting physicochemical properties and environmental fate endpoints

The collection of chemical structure information and associated experimental data for quantitative structure–activity/property relationship (QSAR/QSPR) modeling is facilitated by an increasing number of public databases containing large amounts of useful data. However, the performance of QSAR models...

Descripción completa

Detalles Bibliográficos
Autores principales: Mansouri, Kamel, Grulke, Chris M., Judson, Richard S., Williams, Antony J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5843579/
https://www.ncbi.nlm.nih.gov/pubmed/29520515
http://dx.doi.org/10.1186/s13321-018-0263-1
_version_ 1783305095642873856
author Mansouri, Kamel
Grulke, Chris M.
Judson, Richard S.
Williams, Antony J.
author_facet Mansouri, Kamel
Grulke, Chris M.
Judson, Richard S.
Williams, Antony J.
author_sort Mansouri, Kamel
collection PubMed
description The collection of chemical structure information and associated experimental data for quantitative structure–activity/property relationship (QSAR/QSPR) modeling is facilitated by an increasing number of public databases containing large amounts of useful data. However, the performance of QSAR models highly depends on the quality of the data and modeling methodology used. This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes. This study primarily uses data from the publicly available PHYSPROP database consisting of a set of 13 common physicochemical and environmental fate properties. These datasets have undergone extensive curation using an automated workflow to select only high-quality data, and the chemical structures were standardized prior to calculation of the molecular descriptors. The modeling procedure was developed based on the five Organization for Economic Cooperation and Development (OECD) principles for QSAR models. A weighted k-nearest neighbor approach was adopted using a minimum number of required descriptors calculated using PaDEL, an open-source software. The genetic algorithms selected only the most pertinent and mechanistically interpretable descriptors (2–15, with an average of 11 descriptors). The sizes of the modeled datasets varied from 150 chemicals for biodegradability half-life to 14,050 chemicals for logP, with an average of 3222 chemicals across all endpoints. The optimal models were built on randomly selected training sets (75%) and validated using fivefold cross-validation (CV) and test sets (25%). The CV Q(2) of the models varied from 0.72 to 0.95, with an average of 0.86 and an R(2) test value from 0.71 to 0.96, with an average of 0.82. Modeling and performance details are described in QSAR model reporting format and were validated by the European Commission’s Joint Research Center to be OECD compliant. All models are freely available as an open-source, command-line application called OPEn structure–activity/property Relationship App (OPERA). OPERA models were applied to more than 750,000 chemicals to produce freely available predicted data on the U.S. Environmental Protection Agency’s CompTox Chemistry Dashboard. [Image: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-018-0263-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5843579
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-58435792018-03-19 OPERA models for predicting physicochemical properties and environmental fate endpoints Mansouri, Kamel Grulke, Chris M. Judson, Richard S. Williams, Antony J. J Cheminform Research Article The collection of chemical structure information and associated experimental data for quantitative structure–activity/property relationship (QSAR/QSPR) modeling is facilitated by an increasing number of public databases containing large amounts of useful data. However, the performance of QSAR models highly depends on the quality of the data and modeling methodology used. This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes. This study primarily uses data from the publicly available PHYSPROP database consisting of a set of 13 common physicochemical and environmental fate properties. These datasets have undergone extensive curation using an automated workflow to select only high-quality data, and the chemical structures were standardized prior to calculation of the molecular descriptors. The modeling procedure was developed based on the five Organization for Economic Cooperation and Development (OECD) principles for QSAR models. A weighted k-nearest neighbor approach was adopted using a minimum number of required descriptors calculated using PaDEL, an open-source software. The genetic algorithms selected only the most pertinent and mechanistically interpretable descriptors (2–15, with an average of 11 descriptors). The sizes of the modeled datasets varied from 150 chemicals for biodegradability half-life to 14,050 chemicals for logP, with an average of 3222 chemicals across all endpoints. The optimal models were built on randomly selected training sets (75%) and validated using fivefold cross-validation (CV) and test sets (25%). The CV Q(2) of the models varied from 0.72 to 0.95, with an average of 0.86 and an R(2) test value from 0.71 to 0.96, with an average of 0.82. Modeling and performance details are described in QSAR model reporting format and were validated by the European Commission’s Joint Research Center to be OECD compliant. All models are freely available as an open-source, command-line application called OPEn structure–activity/property Relationship App (OPERA). OPERA models were applied to more than 750,000 chemicals to produce freely available predicted data on the U.S. Environmental Protection Agency’s CompTox Chemistry Dashboard. [Image: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-018-0263-1) contains supplementary material, which is available to authorized users. Springer International Publishing 2018-03-08 /pmc/articles/PMC5843579/ /pubmed/29520515 http://dx.doi.org/10.1186/s13321-018-0263-1 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Mansouri, Kamel
Grulke, Chris M.
Judson, Richard S.
Williams, Antony J.
OPERA models for predicting physicochemical properties and environmental fate endpoints
title OPERA models for predicting physicochemical properties and environmental fate endpoints
title_full OPERA models for predicting physicochemical properties and environmental fate endpoints
title_fullStr OPERA models for predicting physicochemical properties and environmental fate endpoints
title_full_unstemmed OPERA models for predicting physicochemical properties and environmental fate endpoints
title_short OPERA models for predicting physicochemical properties and environmental fate endpoints
title_sort opera models for predicting physicochemical properties and environmental fate endpoints
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5843579/
https://www.ncbi.nlm.nih.gov/pubmed/29520515
http://dx.doi.org/10.1186/s13321-018-0263-1
work_keys_str_mv AT mansourikamel operamodelsforpredictingphysicochemicalpropertiesandenvironmentalfateendpoints
AT grulkechrism operamodelsforpredictingphysicochemicalpropertiesandenvironmentalfateendpoints
AT judsonrichards operamodelsforpredictingphysicochemicalpropertiesandenvironmentalfateendpoints
AT williamsantonyj operamodelsforpredictingphysicochemicalpropertiesandenvironmentalfateendpoints