Cargando…
Ambiguity of non-systematic chemical identifiers within and between small-molecule databases
BACKGROUND: A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand na...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4646925/ https://www.ncbi.nlm.nih.gov/pubmed/26579214 http://dx.doi.org/10.1186/s13321-015-0102-6 |
_version_ | 1782401000198373376 |
---|---|
author | Akhondi, Saber A. Muresan, Sorel Williams, Antony J. Kors, Jan A. |
author_facet | Akhondi, Saber A. Muresan, Sorel Williams, Antony J. Kors, Jan A. |
author_sort | Akhondi, Saber A. |
collection | PubMed |
description | BACKGROUND: A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand names, generic names), which are usually assigned to the compound at the point of registration. The correctness of non-systematic identifiers (i.e., whether an identifier matches the associated structure) can only be assessed manually, which is cumbersome, but it is possible to automatically check their ambiguity (i.e., whether an identifier matches more than one structure). In this study we have quantified the ambiguity of non-systematic identifiers within and between eight widely used chemical databases. We also studied the effect of chemical structure standardization on reducing the ambiguity of non-systematic identifiers. RESULTS: The ambiguity of non-systematic identifiers within databases varied from 0.1 to 15.2 % (median 2.5 %). Standardization reduced the ambiguity only to a small extent for most databases. A wide range of ambiguity existed for non-systematic identifiers that are shared between databases (17.7–60.2 %, median of 40.3 %). Removing stereochemistry information provided the largest reduction in ambiguity across databases (median reduction 13.7 percentage points). CONCLUSIONS: Ambiguity of non-systematic identifiers within chemical databases is generally low, but ambiguity of non-systematic identifiers that are shared between databases, is high. Chemical structure standardization reduces the ambiguity to a limited extent. Our findings can help to improve database integration, curation, and maintenance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-015-0102-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4646925 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-46469252015-11-18 Ambiguity of non-systematic chemical identifiers within and between small-molecule databases Akhondi, Saber A. Muresan, Sorel Williams, Antony J. Kors, Jan A. J Cheminform Research Article BACKGROUND: A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand names, generic names), which are usually assigned to the compound at the point of registration. The correctness of non-systematic identifiers (i.e., whether an identifier matches the associated structure) can only be assessed manually, which is cumbersome, but it is possible to automatically check their ambiguity (i.e., whether an identifier matches more than one structure). In this study we have quantified the ambiguity of non-systematic identifiers within and between eight widely used chemical databases. We also studied the effect of chemical structure standardization on reducing the ambiguity of non-systematic identifiers. RESULTS: The ambiguity of non-systematic identifiers within databases varied from 0.1 to 15.2 % (median 2.5 %). Standardization reduced the ambiguity only to a small extent for most databases. A wide range of ambiguity existed for non-systematic identifiers that are shared between databases (17.7–60.2 %, median of 40.3 %). Removing stereochemistry information provided the largest reduction in ambiguity across databases (median reduction 13.7 percentage points). CONCLUSIONS: Ambiguity of non-systematic identifiers within chemical databases is generally low, but ambiguity of non-systematic identifiers that are shared between databases, is high. Chemical structure standardization reduces the ambiguity to a limited extent. Our findings can help to improve database integration, curation, and maintenance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-015-0102-6) contains supplementary material, which is available to authorized users. Springer International Publishing 2015-11-16 /pmc/articles/PMC4646925/ /pubmed/26579214 http://dx.doi.org/10.1186/s13321-015-0102-6 Text en © Akhondi et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Akhondi, Saber A. Muresan, Sorel Williams, Antony J. Kors, Jan A. Ambiguity of non-systematic chemical identifiers within and between small-molecule databases |
title | Ambiguity of non-systematic chemical identifiers within and between small-molecule databases |
title_full | Ambiguity of non-systematic chemical identifiers within and between small-molecule databases |
title_fullStr | Ambiguity of non-systematic chemical identifiers within and between small-molecule databases |
title_full_unstemmed | Ambiguity of non-systematic chemical identifiers within and between small-molecule databases |
title_short | Ambiguity of non-systematic chemical identifiers within and between small-molecule databases |
title_sort | ambiguity of non-systematic chemical identifiers within and between small-molecule databases |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4646925/ https://www.ncbi.nlm.nih.gov/pubmed/26579214 http://dx.doi.org/10.1186/s13321-015-0102-6 |
work_keys_str_mv | AT akhondisabera ambiguityofnonsystematicchemicalidentifierswithinandbetweensmallmoleculedatabases AT muresansorel ambiguityofnonsystematicchemicalidentifierswithinandbetweensmallmoleculedatabases AT williamsantonyj ambiguityofnonsystematicchemicalidentifierswithinandbetweensmallmoleculedatabases AT korsjana ambiguityofnonsystematicchemicalidentifierswithinandbetweensmallmoleculedatabases |