Cargando…
Exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation
Today more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coeff...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9814212/ https://www.ncbi.nlm.nih.gov/pubmed/36697535 http://dx.doi.org/10.1038/s42004-021-00528-9 |
_version_ | 1784864085341372416 |
---|---|
author | Ulrich, Nadin Goss, Kai-Uwe Ebert, Andrea |
author_facet | Ulrich, Nadin Goss, Kai-Uwe Ebert, Andrea |
author_sort | Ulrich, Nadin |
collection | PubMed |
description | Today more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coefficient (log P) as an example, which plays an essential role in environmental chemistry and toxicology but also in chemical analysis. The predictive performance of the developed DNN is good with an rmse of 0.47 log units in the test dataset and an rmse of 0.33 for an external dataset from the SAMPL6 challenge. To this end, we trained the DNN using data augmentation considering all potential tautomeric forms of the chemicals. We further demonstrate how DNN models can help in the curation of the log P dataset by identifying potential errors, and address limitations of the dataset itself. |
format | Online Article Text |
id | pubmed-9814212 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-98142122023-01-10 Exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation Ulrich, Nadin Goss, Kai-Uwe Ebert, Andrea Commun Chem Article Today more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coefficient (log P) as an example, which plays an essential role in environmental chemistry and toxicology but also in chemical analysis. The predictive performance of the developed DNN is good with an rmse of 0.47 log units in the test dataset and an rmse of 0.33 for an external dataset from the SAMPL6 challenge. To this end, we trained the DNN using data augmentation considering all potential tautomeric forms of the chemicals. We further demonstrate how DNN models can help in the curation of the log P dataset by identifying potential errors, and address limitations of the dataset itself. Nature Publishing Group UK 2021-06-14 /pmc/articles/PMC9814212/ /pubmed/36697535 http://dx.doi.org/10.1038/s42004-021-00528-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Ulrich, Nadin Goss, Kai-Uwe Ebert, Andrea Exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation |
title | Exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation |
title_full | Exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation |
title_fullStr | Exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation |
title_full_unstemmed | Exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation |
title_short | Exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation |
title_sort | exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9814212/ https://www.ncbi.nlm.nih.gov/pubmed/36697535 http://dx.doi.org/10.1038/s42004-021-00528-9 |
work_keys_str_mv | AT ulrichnadin exploringtheoctanolwaterpartitioncoefficientdatasetusingdeeplearningtechniquesanddataaugmentation AT gosskaiuwe exploringtheoctanolwaterpartitioncoefficientdatasetusingdeeplearningtechniquesanddataaugmentation AT ebertandrea exploringtheoctanolwaterpartitioncoefficientdatasetusingdeeplearningtechniquesanddataaugmentation |