Cargando…

Tautomerism in large databases

We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sitzmann, Markus, Ihlenfeldt, Wolf-Dietrich, Nicklaus, Marc C.
Formato:	Texto
Lenguaje:	English
Publicado:	Springer Netherlands 2010
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/ https://www.ncbi.nlm.nih.gov/pubmed/20512400 http://dx.doi.org/10.1007/s10822-010-9346-4

_version_	1782182491993407488
author	Sitzmann, Markus Ihlenfeldt, Wolf-Dietrich Nicklaus, Marc C.
author_facet	Sitzmann, Markus Ihlenfeldt, Wolf-Dietrich Nicklaus, Marc C.
author_sort	Sitzmann, Markus
collection	PubMed
description	We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection.
format	Text
id	pubmed-2886898
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Springer Netherlands
record_format	MEDLINE/PubMed
spelling	pubmed-28868982010-07-21 Tautomerism in large databases Sitzmann, Markus Ihlenfeldt, Wolf-Dietrich Nicklaus, Marc C. J Comput Aided Mol Des Article We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection. Springer Netherlands 2010-05-29 2010 /pmc/articles/PMC2886898/ /pubmed/20512400 http://dx.doi.org/10.1007/s10822-010-9346-4 Text en © The Author(s) 2010 https://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
spellingShingle	Article Sitzmann, Markus Ihlenfeldt, Wolf-Dietrich Nicklaus, Marc C. Tautomerism in large databases
title	Tautomerism in large databases
title_full	Tautomerism in large databases
title_fullStr	Tautomerism in large databases
title_full_unstemmed	Tautomerism in large databases
title_short	Tautomerism in large databases
title_sort	tautomerism in large databases
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/ https://www.ncbi.nlm.nih.gov/pubmed/20512400 http://dx.doi.org/10.1007/s10822-010-9346-4
work_keys_str_mv	AT sitzmannmarkus tautomerisminlargedatabases AT ihlenfeldtwolfdietrich tautomerisminlargedatabases AT nicklausmarcc tautomerisminlargedatabases

Tautomerism in large databases

Ejemplares similares