Cargando…

AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds

Water is a ubiquitous solvent in chemistry and life. It is therefore no surprise that the aqueous solubility of compounds has a key role in various domains, including but not limited to drug discovery, paint, coating, and battery materials design. Measurement and prediction of aqueous solubility is...

Descripción completa

Detalles Bibliográficos
Autores principales: Sorkun, Murat Cihan, Khetan, Abhishek, Er, Süleyman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687799/
https://www.ncbi.nlm.nih.gov/pubmed/31395888
http://dx.doi.org/10.1038/s41597-019-0151-1
_version_ 1783442781350395904
author Sorkun, Murat Cihan
Khetan, Abhishek
Er, Süleyman
author_facet Sorkun, Murat Cihan
Khetan, Abhishek
Er, Süleyman
author_sort Sorkun, Murat Cihan
collection PubMed
description Water is a ubiquitous solvent in chemistry and life. It is therefore no surprise that the aqueous solubility of compounds has a key role in various domains, including but not limited to drug discovery, paint, coating, and battery materials design. Measurement and prediction of aqueous solubility is a complex and prevailing challenge in chemistry. For the latter, different data-driven prediction models have recently been developed to augment the physics-based modeling approaches. To construct accurate data-driven estimation models, it is essential that the underlying experimental calibration data used by these models is of high fidelity and quality. Existing solubility datasets show variance in the chemical space of compounds covered, measurement methods, experimental conditions, but also in the non-standard representations, size, and accessibility of data. To address this problem, we generated a new database of compounds, AqSolDB, by merging a total of nine different aqueous solubility datasets, curating the merged data, standardizing and validating the compound representation formats, marking with reliability labels, and providing 2D descriptors of compounds as a Supplementary Resource.
format Online
Article
Text
id pubmed-6687799
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-66877992019-08-19 AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds Sorkun, Murat Cihan Khetan, Abhishek Er, Süleyman Sci Data Data Descriptor Water is a ubiquitous solvent in chemistry and life. It is therefore no surprise that the aqueous solubility of compounds has a key role in various domains, including but not limited to drug discovery, paint, coating, and battery materials design. Measurement and prediction of aqueous solubility is a complex and prevailing challenge in chemistry. For the latter, different data-driven prediction models have recently been developed to augment the physics-based modeling approaches. To construct accurate data-driven estimation models, it is essential that the underlying experimental calibration data used by these models is of high fidelity and quality. Existing solubility datasets show variance in the chemical space of compounds covered, measurement methods, experimental conditions, but also in the non-standard representations, size, and accessibility of data. To address this problem, we generated a new database of compounds, AqSolDB, by merging a total of nine different aqueous solubility datasets, curating the merged data, standardizing and validating the compound representation formats, marking with reliability labels, and providing 2D descriptors of compounds as a Supplementary Resource. Nature Publishing Group UK 2019-08-08 /pmc/articles/PMC6687799/ /pubmed/31395888 http://dx.doi.org/10.1038/s41597-019-0151-1 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
spellingShingle Data Descriptor
Sorkun, Murat Cihan
Khetan, Abhishek
Er, Süleyman
AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds
title AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds
title_full AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds
title_fullStr AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds
title_full_unstemmed AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds
title_short AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds
title_sort aqsoldb, a curated reference set of aqueous solubility and 2d descriptors for a diverse set of compounds
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687799/
https://www.ncbi.nlm.nih.gov/pubmed/31395888
http://dx.doi.org/10.1038/s41597-019-0151-1
work_keys_str_mv AT sorkunmuratcihan aqsoldbacuratedreferencesetofaqueoussolubilityand2ddescriptorsforadiversesetofcompounds
AT khetanabhishek aqsoldbacuratedreferencesetofaqueoussolubilityand2ddescriptorsforadiversesetofcompounds
AT ersuleyman aqsoldbacuratedreferencesetofaqueoussolubilityand2ddescriptorsforadiversesetofcompounds