Cargando…
Tautomerism in large databases
We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compo...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Springer Netherlands
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/ https://www.ncbi.nlm.nih.gov/pubmed/20512400 http://dx.doi.org/10.1007/s10822-010-9346-4 |
_version_ | 1782182491993407488 |
---|---|
author | Sitzmann, Markus Ihlenfeldt, Wolf-Dietrich Nicklaus, Marc C. |
author_facet | Sitzmann, Markus Ihlenfeldt, Wolf-Dietrich Nicklaus, Marc C. |
author_sort | Sitzmann, Markus |
collection | PubMed |
description | We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection. |
format | Text |
id | pubmed-2886898 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Springer Netherlands |
record_format | MEDLINE/PubMed |
spelling | pubmed-28868982010-07-21 Tautomerism in large databases Sitzmann, Markus Ihlenfeldt, Wolf-Dietrich Nicklaus, Marc C. J Comput Aided Mol Des Article We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection. Springer Netherlands 2010-05-29 2010 /pmc/articles/PMC2886898/ /pubmed/20512400 http://dx.doi.org/10.1007/s10822-010-9346-4 Text en © The Author(s) 2010 https://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited. |
spellingShingle | Article Sitzmann, Markus Ihlenfeldt, Wolf-Dietrich Nicklaus, Marc C. Tautomerism in large databases |
title | Tautomerism in large databases |
title_full | Tautomerism in large databases |
title_fullStr | Tautomerism in large databases |
title_full_unstemmed | Tautomerism in large databases |
title_short | Tautomerism in large databases |
title_sort | tautomerism in large databases |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/ https://www.ncbi.nlm.nih.gov/pubmed/20512400 http://dx.doi.org/10.1007/s10822-010-9346-4 |
work_keys_str_mv | AT sitzmannmarkus tautomerisminlargedatabases AT ihlenfeldtwolfdietrich tautomerisminlargedatabases AT nicklausmarcc tautomerisminlargedatabases |