Cargando…
AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds
Water is a ubiquitous solvent in chemistry and life. It is therefore no surprise that the aqueous solubility of compounds has a key role in various domains, including but not limited to drug discovery, paint, coating, and battery materials design. Measurement and prediction of aqueous solubility is...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687799/ https://www.ncbi.nlm.nih.gov/pubmed/31395888 http://dx.doi.org/10.1038/s41597-019-0151-1 |
_version_ | 1783442781350395904 |
---|---|
author | Sorkun, Murat Cihan Khetan, Abhishek Er, Süleyman |
author_facet | Sorkun, Murat Cihan Khetan, Abhishek Er, Süleyman |
author_sort | Sorkun, Murat Cihan |
collection | PubMed |
description | Water is a ubiquitous solvent in chemistry and life. It is therefore no surprise that the aqueous solubility of compounds has a key role in various domains, including but not limited to drug discovery, paint, coating, and battery materials design. Measurement and prediction of aqueous solubility is a complex and prevailing challenge in chemistry. For the latter, different data-driven prediction models have recently been developed to augment the physics-based modeling approaches. To construct accurate data-driven estimation models, it is essential that the underlying experimental calibration data used by these models is of high fidelity and quality. Existing solubility datasets show variance in the chemical space of compounds covered, measurement methods, experimental conditions, but also in the non-standard representations, size, and accessibility of data. To address this problem, we generated a new database of compounds, AqSolDB, by merging a total of nine different aqueous solubility datasets, curating the merged data, standardizing and validating the compound representation formats, marking with reliability labels, and providing 2D descriptors of compounds as a Supplementary Resource. |
format | Online Article Text |
id | pubmed-6687799 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-66877992019-08-19 AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds Sorkun, Murat Cihan Khetan, Abhishek Er, Süleyman Sci Data Data Descriptor Water is a ubiquitous solvent in chemistry and life. It is therefore no surprise that the aqueous solubility of compounds has a key role in various domains, including but not limited to drug discovery, paint, coating, and battery materials design. Measurement and prediction of aqueous solubility is a complex and prevailing challenge in chemistry. For the latter, different data-driven prediction models have recently been developed to augment the physics-based modeling approaches. To construct accurate data-driven estimation models, it is essential that the underlying experimental calibration data used by these models is of high fidelity and quality. Existing solubility datasets show variance in the chemical space of compounds covered, measurement methods, experimental conditions, but also in the non-standard representations, size, and accessibility of data. To address this problem, we generated a new database of compounds, AqSolDB, by merging a total of nine different aqueous solubility datasets, curating the merged data, standardizing and validating the compound representation formats, marking with reliability labels, and providing 2D descriptors of compounds as a Supplementary Resource. Nature Publishing Group UK 2019-08-08 /pmc/articles/PMC6687799/ /pubmed/31395888 http://dx.doi.org/10.1038/s41597-019-0151-1 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article. |
spellingShingle | Data Descriptor Sorkun, Murat Cihan Khetan, Abhishek Er, Süleyman AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds |
title | AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds |
title_full | AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds |
title_fullStr | AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds |
title_full_unstemmed | AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds |
title_short | AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds |
title_sort | aqsoldb, a curated reference set of aqueous solubility and 2d descriptors for a diverse set of compounds |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687799/ https://www.ncbi.nlm.nih.gov/pubmed/31395888 http://dx.doi.org/10.1038/s41597-019-0151-1 |
work_keys_str_mv | AT sorkunmuratcihan aqsoldbacuratedreferencesetofaqueoussolubilityand2ddescriptorsforadiversesetofcompounds AT khetanabhishek aqsoldbacuratedreferencesetofaqueoussolubilityand2ddescriptorsforadiversesetofcompounds AT ersuleyman aqsoldbacuratedreferencesetofaqueoussolubilityand2ddescriptorsforadiversesetofcompounds |