Cargando…

InChIKey collision resistance: an experimental testing

InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isome...

Descripción completa

Detalles Bibliográficos
Autores principales: Pletnev, Igor, Erin, Andrey, McNaught, Alan, Blinov, Kirill, Tchekhovskoi, Dmitrii, Heller, Steve
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558395/
https://www.ncbi.nlm.nih.gov/pubmed/23256896
http://dx.doi.org/10.1186/1758-2946-4-39
_version_ 1782257426223857664
author Pletnev, Igor
Erin, Andrey
McNaught, Alan
Blinov, Kirill
Tchekhovskoi, Dmitrii
Heller, Steve
author_facet Pletnev, Igor
Erin, Andrey
McNaught, Alan
Blinov, Kirill
Tchekhovskoi, Dmitrii
Heller, Steve
author_sort Pletnev, Igor
collection PubMed
description InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications. We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 10(10) chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body. From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations.
format Online
Article
Text
id pubmed-3558395
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35583952013-01-31 InChIKey collision resistance: an experimental testing Pletnev, Igor Erin, Andrey McNaught, Alan Blinov, Kirill Tchekhovskoi, Dmitrii Heller, Steve J Cheminform Research Article InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications. We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 10(10) chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body. From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations. BioMed Central 2012-12-20 /pmc/articles/PMC3558395/ /pubmed/23256896 http://dx.doi.org/10.1186/1758-2946-4-39 Text en Copyright ©2012 Pletnev et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Pletnev, Igor
Erin, Andrey
McNaught, Alan
Blinov, Kirill
Tchekhovskoi, Dmitrii
Heller, Steve
InChIKey collision resistance: an experimental testing
title InChIKey collision resistance: an experimental testing
title_full InChIKey collision resistance: an experimental testing
title_fullStr InChIKey collision resistance: an experimental testing
title_full_unstemmed InChIKey collision resistance: an experimental testing
title_short InChIKey collision resistance: an experimental testing
title_sort inchikey collision resistance: an experimental testing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558395/
https://www.ncbi.nlm.nih.gov/pubmed/23256896
http://dx.doi.org/10.1186/1758-2946-4-39
work_keys_str_mv AT pletnevigor inchikeycollisionresistanceanexperimentaltesting
AT erinandrey inchikeycollisionresistanceanexperimentaltesting
AT mcnaughtalan inchikeycollisionresistanceanexperimentaltesting
AT blinovkirill inchikeycollisionresistanceanexperimentaltesting
AT tchekhovskoidmitrii inchikeycollisionresistanceanexperimentaltesting
AT hellersteve inchikeycollisionresistanceanexperimentaltesting