Cargando…

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases

BACKGROUND: A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand na...

Descripción completa

Detalles Bibliográficos
Autores principales: Akhondi, Saber A., Muresan, Sorel, Williams, Antony J., Kors, Jan A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4646925/
https://www.ncbi.nlm.nih.gov/pubmed/26579214
http://dx.doi.org/10.1186/s13321-015-0102-6
_version_ 1782401000198373376
author Akhondi, Saber A.
Muresan, Sorel
Williams, Antony J.
Kors, Jan A.
author_facet Akhondi, Saber A.
Muresan, Sorel
Williams, Antony J.
Kors, Jan A.
author_sort Akhondi, Saber A.
collection PubMed
description BACKGROUND: A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand names, generic names), which are usually assigned to the compound at the point of registration. The correctness of non-systematic identifiers (i.e., whether an identifier matches the associated structure) can only be assessed manually, which is cumbersome, but it is possible to automatically check their ambiguity (i.e., whether an identifier matches more than one structure). In this study we have quantified the ambiguity of non-systematic identifiers within and between eight widely used chemical databases. We also studied the effect of chemical structure standardization on reducing the ambiguity of non-systematic identifiers. RESULTS: The ambiguity of non-systematic identifiers within databases varied from 0.1 to 15.2 % (median 2.5 %). Standardization reduced the ambiguity only to a small extent for most databases. A wide range of ambiguity existed for non-systematic identifiers that are shared between databases (17.7–60.2 %, median of 40.3 %). Removing stereochemistry information provided the largest reduction in ambiguity across databases (median reduction 13.7 percentage points). CONCLUSIONS: Ambiguity of non-systematic identifiers within chemical databases is generally low, but ambiguity of non-systematic identifiers that are shared between databases, is high. Chemical structure standardization reduces the ambiguity to a limited extent. Our findings can help to improve database integration, curation, and maintenance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-015-0102-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4646925
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-46469252015-11-18 Ambiguity of non-systematic chemical identifiers within and between small-molecule databases Akhondi, Saber A. Muresan, Sorel Williams, Antony J. Kors, Jan A. J Cheminform Research Article BACKGROUND: A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand names, generic names), which are usually assigned to the compound at the point of registration. The correctness of non-systematic identifiers (i.e., whether an identifier matches the associated structure) can only be assessed manually, which is cumbersome, but it is possible to automatically check their ambiguity (i.e., whether an identifier matches more than one structure). In this study we have quantified the ambiguity of non-systematic identifiers within and between eight widely used chemical databases. We also studied the effect of chemical structure standardization on reducing the ambiguity of non-systematic identifiers. RESULTS: The ambiguity of non-systematic identifiers within databases varied from 0.1 to 15.2 % (median 2.5 %). Standardization reduced the ambiguity only to a small extent for most databases. A wide range of ambiguity existed for non-systematic identifiers that are shared between databases (17.7–60.2 %, median of 40.3 %). Removing stereochemistry information provided the largest reduction in ambiguity across databases (median reduction 13.7 percentage points). CONCLUSIONS: Ambiguity of non-systematic identifiers within chemical databases is generally low, but ambiguity of non-systematic identifiers that are shared between databases, is high. Chemical structure standardization reduces the ambiguity to a limited extent. Our findings can help to improve database integration, curation, and maintenance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-015-0102-6) contains supplementary material, which is available to authorized users. Springer International Publishing 2015-11-16 /pmc/articles/PMC4646925/ /pubmed/26579214 http://dx.doi.org/10.1186/s13321-015-0102-6 Text en © Akhondi et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Akhondi, Saber A.
Muresan, Sorel
Williams, Antony J.
Kors, Jan A.
Ambiguity of non-systematic chemical identifiers within and between small-molecule databases
title Ambiguity of non-systematic chemical identifiers within and between small-molecule databases
title_full Ambiguity of non-systematic chemical identifiers within and between small-molecule databases
title_fullStr Ambiguity of non-systematic chemical identifiers within and between small-molecule databases
title_full_unstemmed Ambiguity of non-systematic chemical identifiers within and between small-molecule databases
title_short Ambiguity of non-systematic chemical identifiers within and between small-molecule databases
title_sort ambiguity of non-systematic chemical identifiers within and between small-molecule databases
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4646925/
https://www.ncbi.nlm.nih.gov/pubmed/26579214
http://dx.doi.org/10.1186/s13321-015-0102-6
work_keys_str_mv AT akhondisabera ambiguityofnonsystematicchemicalidentifierswithinandbetweensmallmoleculedatabases
AT muresansorel ambiguityofnonsystematicchemicalidentifierswithinandbetweensmallmoleculedatabases
AT williamsantonyj ambiguityofnonsystematicchemicalidentifierswithinandbetweensmallmoleculedatabases
AT korsjana ambiguityofnonsystematicchemicalidentifierswithinandbetweensmallmoleculedatabases