Cargando…

Tautomerism in large databases

We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compo...

Descripción completa

Detalles Bibliográficos
Autores principales: Sitzmann, Markus, Ihlenfeldt, Wolf-Dietrich, Nicklaus, Marc C.
Formato: Texto
Lenguaje:English
Publicado: Springer Netherlands 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/
https://www.ncbi.nlm.nih.gov/pubmed/20512400
http://dx.doi.org/10.1007/s10822-010-9346-4
_version_ 1782182491993407488
author Sitzmann, Markus
Ihlenfeldt, Wolf-Dietrich
Nicklaus, Marc C.
author_facet Sitzmann, Markus
Ihlenfeldt, Wolf-Dietrich
Nicklaus, Marc C.
author_sort Sitzmann, Markus
collection PubMed
description We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection.
format Text
id pubmed-2886898
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Springer Netherlands
record_format MEDLINE/PubMed
spelling pubmed-28868982010-07-21 Tautomerism in large databases Sitzmann, Markus Ihlenfeldt, Wolf-Dietrich Nicklaus, Marc C. J Comput Aided Mol Des Article We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection. Springer Netherlands 2010-05-29 2010 /pmc/articles/PMC2886898/ /pubmed/20512400 http://dx.doi.org/10.1007/s10822-010-9346-4 Text en © The Author(s) 2010 https://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
spellingShingle Article
Sitzmann, Markus
Ihlenfeldt, Wolf-Dietrich
Nicklaus, Marc C.
Tautomerism in large databases
title Tautomerism in large databases
title_full Tautomerism in large databases
title_fullStr Tautomerism in large databases
title_full_unstemmed Tautomerism in large databases
title_short Tautomerism in large databases
title_sort tautomerism in large databases
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/
https://www.ncbi.nlm.nih.gov/pubmed/20512400
http://dx.doi.org/10.1007/s10822-010-9346-4
work_keys_str_mv AT sitzmannmarkus tautomerisminlargedatabases
AT ihlenfeldtwolfdietrich tautomerisminlargedatabases
AT nicklausmarcc tautomerisminlargedatabases