Cargando…

The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

BACKGROUND: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics a...

Descripción completa

Detalles Bibliográficos
Autores principales: Willighagen, Egon L., Mayfield, John W., Alvarsson, Jonathan, Berg, Arvid, Carlsson, Lars, Jeliazkova, Nina, Kuhn, Stefan, Pluskal, Tomáš, Rojas-Chertó, Miquel, Spjuth, Ola, Torrance, Gilleain, Evelo, Chris T., Guha, Rajarshi, Steinbeck, Christoph
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461230/
https://www.ncbi.nlm.nih.gov/pubmed/29086040
http://dx.doi.org/10.1186/s13321-017-0220-4
_version_ 1783242296765972480
author Willighagen, Egon L.
Mayfield, John W.
Alvarsson, Jonathan
Berg, Arvid
Carlsson, Lars
Jeliazkova, Nina
Kuhn, Stefan
Pluskal, Tomáš
Rojas-Chertó, Miquel
Spjuth, Ola
Torrance, Gilleain
Evelo, Chris T.
Guha, Rajarshi
Steinbeck, Christoph
author_facet Willighagen, Egon L.
Mayfield, John W.
Alvarsson, Jonathan
Berg, Arvid
Carlsson, Lars
Jeliazkova, Nina
Kuhn, Stefan
Pluskal, Tomáš
Rojas-Chertó, Miquel
Spjuth, Ola
Torrance, Gilleain
Evelo, Chris T.
Guha, Rajarshi
Steinbeck, Christoph
author_sort Willighagen, Egon L.
collection PubMed
description BACKGROUND: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. RESULTS: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. CONCLUSIONS: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0220-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5461230
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-54612302017-06-22 The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching Willighagen, Egon L. Mayfield, John W. Alvarsson, Jonathan Berg, Arvid Carlsson, Lars Jeliazkova, Nina Kuhn, Stefan Pluskal, Tomáš Rojas-Chertó, Miquel Spjuth, Ola Torrance, Gilleain Evelo, Chris T. Guha, Rajarshi Steinbeck, Christoph J Cheminform Software BACKGROUND: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. RESULTS: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. CONCLUSIONS: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0220-4) contains supplementary material, which is available to authorized users. Springer International Publishing 2017-06-06 /pmc/articles/PMC5461230/ /pubmed/29086040 http://dx.doi.org/10.1186/s13321-017-0220-4 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Willighagen, Egon L.
Mayfield, John W.
Alvarsson, Jonathan
Berg, Arvid
Carlsson, Lars
Jeliazkova, Nina
Kuhn, Stefan
Pluskal, Tomáš
Rojas-Chertó, Miquel
Spjuth, Ola
Torrance, Gilleain
Evelo, Chris T.
Guha, Rajarshi
Steinbeck, Christoph
The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
title The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
title_full The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
title_fullStr The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
title_full_unstemmed The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
title_short The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching
title_sort chemistry development kit (cdk) v2.0: atom typing, depiction, molecular formulas, and substructure searching
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461230/
https://www.ncbi.nlm.nih.gov/pubmed/29086040
http://dx.doi.org/10.1186/s13321-017-0220-4
work_keys_str_mv AT willighagenegonl thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT mayfieldjohnw thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT alvarssonjonathan thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT bergarvid thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT carlssonlars thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT jeliazkovanina thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT kuhnstefan thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT pluskaltomas thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT rojaschertomiquel thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT spjuthola thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT torrancegilleain thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT evelochrist thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT guharajarshi thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT steinbeckchristoph thechemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT willighagenegonl chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT mayfieldjohnw chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT alvarssonjonathan chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT bergarvid chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT carlssonlars chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT jeliazkovanina chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT kuhnstefan chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT pluskaltomas chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT rojaschertomiquel chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT spjuthola chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT torrancegilleain chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT evelochrist chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT guharajarshi chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching
AT steinbeckchristoph chemistrydevelopmentkitcdkv20atomtypingdepictionmolecularformulasandsubstructuresearching