Cargando…

Validation and extraction of molecular-geometry information from small-molecule databases

A freely available small-molecule structure database, the Crystallography Open Database (COD), is used for the extraction of molecular-geometry information on small-molecule compounds. The results are used for the generation of new ligand descriptions, which are subsequently used by macromolecular m...

Descripción completa

Detalles Bibliográficos
Autores principales: Long, Fei, Nicholls, Robert A., Emsley, Paul, Gražulis, Saulius, Merkys, Andrius, Vaitkus, Antanas, Murshudov, Garib N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: International Union of Crystallography 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5297913/
https://www.ncbi.nlm.nih.gov/pubmed/28177306
http://dx.doi.org/10.1107/S2059798317000079
_version_ 1782505805261570048
author Long, Fei
Nicholls, Robert A.
Emsley, Paul
Gražulis, Saulius
Merkys, Andrius
Vaitkus, Antanas
Murshudov, Garib N.
author_facet Long, Fei
Nicholls, Robert A.
Emsley, Paul
Gražulis, Saulius
Merkys, Andrius
Vaitkus, Antanas
Murshudov, Garib N.
author_sort Long, Fei
collection PubMed
description A freely available small-molecule structure database, the Crystallography Open Database (COD), is used for the extraction of molecular-geometry information on small-molecule compounds. The results are used for the generation of new ligand descriptions, which are subsequently used by macromolecular model-building and structure-refinement software. To increase the reliability of the derived data, and therefore the new ligand descriptions, the entries from this database were subjected to very strict validation. The selection criteria made sure that the crystal structures used to derive atom types, bond and angle classes are of sufficiently high quality. Any suspicious entries at a crystal or molecular level were removed from further consideration. The selection criteria included (i) the resolution of the data used for refinement (entries solved at 0.84 Å resolution or higher) and (ii) the structure-solution method (structures must be from a single-crystal experiment and all atoms of generated molecules must have full occupancies), as well as basic sanity checks such as (iii) consistency between the valences and the number of connections between atoms, (iv) acceptable bond-length deviations from the expected values and (v) detection of atomic collisions. The derived atom types and bond classes were then validated using high-order moment-based statistical techniques. The results of the statistical analyses were fed back to fine-tune the atom typing. The developed procedure was repeated four times, resulting in fine-grained atom typing, bond and angle classes. The procedure will be repeated in the future as and when new entries are deposited in the COD. The whole procedure can also be applied to any source of small-molecule structures, including the Cambridge Structural Database and the ZINC database.
format Online
Article
Text
id pubmed-5297913
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher International Union of Crystallography
record_format MEDLINE/PubMed
spelling pubmed-52979132017-02-17 Validation and extraction of molecular-geometry information from small-molecule databases Long, Fei Nicholls, Robert A. Emsley, Paul Gražulis, Saulius Merkys, Andrius Vaitkus, Antanas Murshudov, Garib N. Acta Crystallogr D Struct Biol Research Papers A freely available small-molecule structure database, the Crystallography Open Database (COD), is used for the extraction of molecular-geometry information on small-molecule compounds. The results are used for the generation of new ligand descriptions, which are subsequently used by macromolecular model-building and structure-refinement software. To increase the reliability of the derived data, and therefore the new ligand descriptions, the entries from this database were subjected to very strict validation. The selection criteria made sure that the crystal structures used to derive atom types, bond and angle classes are of sufficiently high quality. Any suspicious entries at a crystal or molecular level were removed from further consideration. The selection criteria included (i) the resolution of the data used for refinement (entries solved at 0.84 Å resolution or higher) and (ii) the structure-solution method (structures must be from a single-crystal experiment and all atoms of generated molecules must have full occupancies), as well as basic sanity checks such as (iii) consistency between the valences and the number of connections between atoms, (iv) acceptable bond-length deviations from the expected values and (v) detection of atomic collisions. The derived atom types and bond classes were then validated using high-order moment-based statistical techniques. The results of the statistical analyses were fed back to fine-tune the atom typing. The developed procedure was repeated four times, resulting in fine-grained atom typing, bond and angle classes. The procedure will be repeated in the future as and when new entries are deposited in the COD. The whole procedure can also be applied to any source of small-molecule structures, including the Cambridge Structural Database and the ZINC database. International Union of Crystallography 2017-02-01 /pmc/articles/PMC5297913/ /pubmed/28177306 http://dx.doi.org/10.1107/S2059798317000079 Text en © Long et al. 2017 http://creativecommons.org/licenses/by/2.0/uk/ This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.http://creativecommons.org/licenses/by/2.0/uk/
spellingShingle Research Papers
Long, Fei
Nicholls, Robert A.
Emsley, Paul
Gražulis, Saulius
Merkys, Andrius
Vaitkus, Antanas
Murshudov, Garib N.
Validation and extraction of molecular-geometry information from small-molecule databases
title Validation and extraction of molecular-geometry information from small-molecule databases
title_full Validation and extraction of molecular-geometry information from small-molecule databases
title_fullStr Validation and extraction of molecular-geometry information from small-molecule databases
title_full_unstemmed Validation and extraction of molecular-geometry information from small-molecule databases
title_short Validation and extraction of molecular-geometry information from small-molecule databases
title_sort validation and extraction of molecular-geometry information from small-molecule databases
topic Research Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5297913/
https://www.ncbi.nlm.nih.gov/pubmed/28177306
http://dx.doi.org/10.1107/S2059798317000079
work_keys_str_mv AT longfei validationandextractionofmoleculargeometryinformationfromsmallmoleculedatabases
AT nichollsroberta validationandextractionofmoleculargeometryinformationfromsmallmoleculedatabases
AT emsleypaul validationandextractionofmoleculargeometryinformationfromsmallmoleculedatabases
AT grazulissaulius validationandextractionofmoleculargeometryinformationfromsmallmoleculedatabases
AT merkysandrius validationandextractionofmoleculargeometryinformationfromsmallmoleculedatabases
AT vaitkusantanas validationandextractionofmoleculargeometryinformationfromsmallmoleculedatabases
AT murshudovgaribn validationandextractionofmoleculargeometryinformationfromsmallmoleculedatabases