Cargando…

On InChI and evaluating the quality of cross-reference links

BACKGROUND: There are many databases of small molecules focused on different aspects of research and its applications. Some tasks may require integration of information from various databases. However, determining which entries from different databases represent the same compound is not straightforw...

Descripción completa

Detalles Bibliográficos
Autores principales: Galgonek, Jakub, Vondrášek, Jiří
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4005828/
https://www.ncbi.nlm.nih.gov/pubmed/24742140
http://dx.doi.org/10.1186/1758-2946-6-15
_version_ 1782314159975694336
author Galgonek, Jakub
Vondrášek, Jiří
author_facet Galgonek, Jakub
Vondrášek, Jiří
author_sort Galgonek, Jakub
collection PubMed
description BACKGROUND: There are many databases of small molecules focused on different aspects of research and its applications. Some tasks may require integration of information from various databases. However, determining which entries from different databases represent the same compound is not straightforward. Integration can be based, for example, on automatically generated cross-reference links between entries. Another approach is to use the manually curated links stored directly in databases. This study employs well-established InChI identifiers to measure the consistency and completeness of the manually curated links by comparing them with the automatically generated ones. RESULTS: We used two different tools to generate InChI identifiers and observed some ambiguities in their outputs. In part, these ambiguities were caused by indistinctness in interpretation of the structural data used. InChI identifiers were used successfully to find duplicate entries in databases. We found that the InChI inconsistencies in the manually curated links are very high (28.85% in the worst case). Even using a weaker definition of consistency, the measured values were very high in general. The completeness of the manually curated links was also very poor (only 93.8% in the best case) compared with that of the automatically generated links. CONCLUSIONS: We observed several problems with the InChI tools and the files used as their inputs. There are large gaps in the consistency and completeness of manually curated links if they are measured using InChI identifiers. However, inconsistency can be caused both by errors in manually curated links and the inherent limitations of the InChI method.
format Online
Article
Text
id pubmed-4005828
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40058282014-05-01 On InChI and evaluating the quality of cross-reference links Galgonek, Jakub Vondrášek, Jiří J Cheminform Research Article BACKGROUND: There are many databases of small molecules focused on different aspects of research and its applications. Some tasks may require integration of information from various databases. However, determining which entries from different databases represent the same compound is not straightforward. Integration can be based, for example, on automatically generated cross-reference links between entries. Another approach is to use the manually curated links stored directly in databases. This study employs well-established InChI identifiers to measure the consistency and completeness of the manually curated links by comparing them with the automatically generated ones. RESULTS: We used two different tools to generate InChI identifiers and observed some ambiguities in their outputs. In part, these ambiguities were caused by indistinctness in interpretation of the structural data used. InChI identifiers were used successfully to find duplicate entries in databases. We found that the InChI inconsistencies in the manually curated links are very high (28.85% in the worst case). Even using a weaker definition of consistency, the measured values were very high in general. The completeness of the manually curated links was also very poor (only 93.8% in the best case) compared with that of the automatically generated links. CONCLUSIONS: We observed several problems with the InChI tools and the files used as their inputs. There are large gaps in the consistency and completeness of manually curated links if they are measured using InChI identifiers. However, inconsistency can be caused both by errors in manually curated links and the inherent limitations of the InChI method. BioMed Central 2014-04-17 /pmc/articles/PMC4005828/ /pubmed/24742140 http://dx.doi.org/10.1186/1758-2946-6-15 Text en Copyright © 2014 Galgonek and Vondrášek; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Galgonek, Jakub
Vondrášek, Jiří
On InChI and evaluating the quality of cross-reference links
title On InChI and evaluating the quality of cross-reference links
title_full On InChI and evaluating the quality of cross-reference links
title_fullStr On InChI and evaluating the quality of cross-reference links
title_full_unstemmed On InChI and evaluating the quality of cross-reference links
title_short On InChI and evaluating the quality of cross-reference links
title_sort on inchi and evaluating the quality of cross-reference links
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4005828/
https://www.ncbi.nlm.nih.gov/pubmed/24742140
http://dx.doi.org/10.1186/1758-2946-6-15
work_keys_str_mv AT galgonekjakub oninchiandevaluatingthequalityofcrossreferencelinks
AT vondrasekjiri oninchiandevaluatingthequalityofcrossreferencelinks