Cargando…
InChIKey collision resistance: an experimental testing
InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isome...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558395/ https://www.ncbi.nlm.nih.gov/pubmed/23256896 http://dx.doi.org/10.1186/1758-2946-4-39 |
_version_ | 1782257426223857664 |
---|---|
author | Pletnev, Igor Erin, Andrey McNaught, Alan Blinov, Kirill Tchekhovskoi, Dmitrii Heller, Steve |
author_facet | Pletnev, Igor Erin, Andrey McNaught, Alan Blinov, Kirill Tchekhovskoi, Dmitrii Heller, Steve |
author_sort | Pletnev, Igor |
collection | PubMed |
description | InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications. We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 10(10) chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body. From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations. |
format | Online Article Text |
id | pubmed-3558395 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35583952013-01-31 InChIKey collision resistance: an experimental testing Pletnev, Igor Erin, Andrey McNaught, Alan Blinov, Kirill Tchekhovskoi, Dmitrii Heller, Steve J Cheminform Research Article InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications. We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 10(10) chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body. From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations. BioMed Central 2012-12-20 /pmc/articles/PMC3558395/ /pubmed/23256896 http://dx.doi.org/10.1186/1758-2946-4-39 Text en Copyright ©2012 Pletnev et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Pletnev, Igor Erin, Andrey McNaught, Alan Blinov, Kirill Tchekhovskoi, Dmitrii Heller, Steve InChIKey collision resistance: an experimental testing |
title | InChIKey collision resistance: an experimental testing |
title_full | InChIKey collision resistance: an experimental testing |
title_fullStr | InChIKey collision resistance: an experimental testing |
title_full_unstemmed | InChIKey collision resistance: an experimental testing |
title_short | InChIKey collision resistance: an experimental testing |
title_sort | inchikey collision resistance: an experimental testing |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558395/ https://www.ncbi.nlm.nih.gov/pubmed/23256896 http://dx.doi.org/10.1186/1758-2946-4-39 |
work_keys_str_mv | AT pletnevigor inchikeycollisionresistanceanexperimentaltesting AT erinandrey inchikeycollisionresistanceanexperimentaltesting AT mcnaughtalan inchikeycollisionresistanceanexperimentaltesting AT blinovkirill inchikeycollisionresistanceanexperimentaltesting AT tchekhovskoidmitrii inchikeycollisionresistanceanexperimentaltesting AT hellersteve inchikeycollisionresistanceanexperimentaltesting |