Cargando…
Caveat Usor: Assessing Differences between Major Chemistry Databases
The three databases of PubChem, ChemSpider, and UniChem capture the majority of open chemical structure records with February 2018 totals of 95, 63, and 154 million, respectively. Collectively, they constitute a massively enabling resource for cheminformatics, chemical biology, and drug discovery. A...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5900829/ https://www.ncbi.nlm.nih.gov/pubmed/29451740 http://dx.doi.org/10.1002/cmdc.201700724 |
_version_ | 1783314489029951488 |
---|---|
author | Southan, Christopher |
author_facet | Southan, Christopher |
author_sort | Southan, Christopher |
collection | PubMed |
description | The three databases of PubChem, ChemSpider, and UniChem capture the majority of open chemical structure records with February 2018 totals of 95, 63, and 154 million, respectively. Collectively, they constitute a massively enabling resource for cheminformatics, chemical biology, and drug discovery. As meta‐portals, they subsume and link out to the major proportion of public bioactivity data extracted from the literature and screening center assay results. Therefore, they not only present three different entry points, but the many subsumed independent resources present a fourth entry point in the form of standalone databases. Because this creates a complex picture it is important for users to have at least some appreciation of differential content to enable utility judgments for the tasks at hand. This turns out to be challenging. By comparing the three resources in detail, this review assesses their differences, some of which are not obvious. This includes the fact that coverage is significantly different between the 587, 282, and 38 contributing sources, respectively. This not only presents the “who‐has‐what” question, but also the reason “why” any particular inclusion is considered valuable is rarely made explicit. Also confusing is that sources nominally in common (i.e., having the same submitter name) can have significantly different structure counts, not only in each of the three but also from their standalone instantiations. Assessing a series of examples indicates that differences in loading dates and structural standardization are the main causes of this inter‐portal discordance. |
format | Online Article Text |
id | pubmed-5900829 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-59008292018-04-23 Caveat Usor: Assessing Differences between Major Chemistry Databases Southan, Christopher ChemMedChem Reviews The three databases of PubChem, ChemSpider, and UniChem capture the majority of open chemical structure records with February 2018 totals of 95, 63, and 154 million, respectively. Collectively, they constitute a massively enabling resource for cheminformatics, chemical biology, and drug discovery. As meta‐portals, they subsume and link out to the major proportion of public bioactivity data extracted from the literature and screening center assay results. Therefore, they not only present three different entry points, but the many subsumed independent resources present a fourth entry point in the form of standalone databases. Because this creates a complex picture it is important for users to have at least some appreciation of differential content to enable utility judgments for the tasks at hand. This turns out to be challenging. By comparing the three resources in detail, this review assesses their differences, some of which are not obvious. This includes the fact that coverage is significantly different between the 587, 282, and 38 contributing sources, respectively. This not only presents the “who‐has‐what” question, but also the reason “why” any particular inclusion is considered valuable is rarely made explicit. Also confusing is that sources nominally in common (i.e., having the same submitter name) can have significantly different structure counts, not only in each of the three but also from their standalone instantiations. Assessing a series of examples indicates that differences in loading dates and structural standardization are the main causes of this inter‐portal discordance. John Wiley and Sons Inc. 2018-02-23 2018-03-20 /pmc/articles/PMC5900829/ /pubmed/29451740 http://dx.doi.org/10.1002/cmdc.201700724 Text en © 2018 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Reviews Southan, Christopher Caveat Usor: Assessing Differences between Major Chemistry Databases |
title | Caveat Usor: Assessing Differences between Major Chemistry Databases |
title_full | Caveat Usor: Assessing Differences between Major Chemistry Databases |
title_fullStr | Caveat Usor: Assessing Differences between Major Chemistry Databases |
title_full_unstemmed | Caveat Usor: Assessing Differences between Major Chemistry Databases |
title_short | Caveat Usor: Assessing Differences between Major Chemistry Databases |
title_sort | caveat usor: assessing differences between major chemistry databases |
topic | Reviews |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5900829/ https://www.ncbi.nlm.nih.gov/pubmed/29451740 http://dx.doi.org/10.1002/cmdc.201700724 |
work_keys_str_mv | AT southanchristopher caveatusorassessingdifferencesbetweenmajorchemistrydatabases |