Cargando…

Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm

Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural p...

Descripción completa

Detalles Bibliográficos
Autores principales: Skinnider, Michael A., Dejong, Chris A., Franczak, Brian C., McNicholas, Paul D., Magarvey, Nathan A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5559407/
https://www.ncbi.nlm.nih.gov/pubmed/29086195
http://dx.doi.org/10.1186/s13321-017-0234-y
_version_ 1783257508370972672
author Skinnider, Michael A.
Dejong, Chris A.
Franczak, Brian C.
McNicholas, Paul D.
Magarvey, Nathan A.
author_facet Skinnider, Michael A.
Dejong, Chris A.
Franczak, Brian C.
McNicholas, Paul D.
Magarvey, Nathan A.
author_sort Skinnider, Michael A.
collection PubMed
description Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural products is a particularly important problem, as the biological activities of these molecules have been extensively optimized by natural selection. The large and structurally complex scaffolds of natural products distinguish their physical and chemical properties from those of synthetic compounds. However, no analysis of the performance of existing methods for molecular similarity calculation specific to natural products has been reported to date. Here, we present LEMONS, an algorithm for the enumeration of hypothetical modular natural product structures. We leverage this algorithm to conduct a comparative analysis of molecular similarity methods within the unique chemical space occupied by modular natural products using controlled synthetic data, and comprehensively investigate the impact of diverse biosynthetic parameters on similarity search. We additionally investigate a recently described algorithm for natural product retrobiosynthesis and alignment, and find that when rule-based retrobiosynthesis can be applied, this approach outperforms conventional two-dimensional fingerprints, suggesting it may represent a valuable approach for the targeted exploration of natural product chemical space and microbial genome mining. Our open-source algorithm is an extensible method of enumerating hypothetical natural product structures with diverse potential applications in bioinformatics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0234-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5559407
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-55594072017-08-31 Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm Skinnider, Michael A. Dejong, Chris A. Franczak, Brian C. McNicholas, Paul D. Magarvey, Nathan A. J Cheminform Research Article Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural products is a particularly important problem, as the biological activities of these molecules have been extensively optimized by natural selection. The large and structurally complex scaffolds of natural products distinguish their physical and chemical properties from those of synthetic compounds. However, no analysis of the performance of existing methods for molecular similarity calculation specific to natural products has been reported to date. Here, we present LEMONS, an algorithm for the enumeration of hypothetical modular natural product structures. We leverage this algorithm to conduct a comparative analysis of molecular similarity methods within the unique chemical space occupied by modular natural products using controlled synthetic data, and comprehensively investigate the impact of diverse biosynthetic parameters on similarity search. We additionally investigate a recently described algorithm for natural product retrobiosynthesis and alignment, and find that when rule-based retrobiosynthesis can be applied, this approach outperforms conventional two-dimensional fingerprints, suggesting it may represent a valuable approach for the targeted exploration of natural product chemical space and microbial genome mining. Our open-source algorithm is an extensible method of enumerating hypothetical natural product structures with diverse potential applications in bioinformatics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0234-y) contains supplementary material, which is available to authorized users. Springer International Publishing 2017-08-16 /pmc/articles/PMC5559407/ /pubmed/29086195 http://dx.doi.org/10.1186/s13321-017-0234-y Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Skinnider, Michael A.
Dejong, Chris A.
Franczak, Brian C.
McNicholas, Paul D.
Magarvey, Nathan A.
Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm
title Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm
title_full Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm
title_fullStr Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm
title_full_unstemmed Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm
title_short Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm
title_sort comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5559407/
https://www.ncbi.nlm.nih.gov/pubmed/29086195
http://dx.doi.org/10.1186/s13321-017-0234-y
work_keys_str_mv AT skinnidermichaela comparativeanalysisofchemicalsimilaritymethodsformodularnaturalproductswithahypotheticalstructureenumerationalgorithm
AT dejongchrisa comparativeanalysisofchemicalsimilaritymethodsformodularnaturalproductswithahypotheticalstructureenumerationalgorithm
AT franczakbrianc comparativeanalysisofchemicalsimilaritymethodsformodularnaturalproductswithahypotheticalstructureenumerationalgorithm
AT mcnicholaspauld comparativeanalysisofchemicalsimilaritymethodsformodularnaturalproductswithahypotheticalstructureenumerationalgorithm
AT magarveynathana comparativeanalysisofchemicalsimilaritymethodsformodularnaturalproductswithahypotheticalstructureenumerationalgorithm