Cargando…

The Chemical Space of Terpenes: Insights from Data Science and AI

Terpenes are a widespread class of natural products with significant chemical and biological diversity, and many of these molecules have already made their way into medicines. In this work, we employ a data science-based approach to identify, compile, and characterize the diversity of terpenes curre...

Descripción completa

Detalles Bibliográficos
Autores principales: Hosseini, Morteza, Pereira, David M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9961535/
https://www.ncbi.nlm.nih.gov/pubmed/37259351
http://dx.doi.org/10.3390/ph16020202
_version_ 1784895778630664192
author Hosseini, Morteza
Pereira, David M.
author_facet Hosseini, Morteza
Pereira, David M.
author_sort Hosseini, Morteza
collection PubMed
description Terpenes are a widespread class of natural products with significant chemical and biological diversity, and many of these molecules have already made their way into medicines. In this work, we employ a data science-based approach to identify, compile, and characterize the diversity of terpenes currently known in a systematic way, in a total of 59,833 molecules. We also employed several methods for the purpose of classifying terpene subclasses using their physicochemical descriptors. Light gradient boosting machine, k-nearest neighbours, random forests, Gaussian naïve Bayes and Multilayer perceptron were tested, with the best-performing algorithms yielding accuracy, F1 score, precision and other metrics all over 0.9, thus showing the capabilities of these approaches for the classification of terpene subclasses. These results can be important for the field of phytochemistry and pharmacognosy, as they allow the prediction of the subclass of novel terpene molecules, even when biosynthetic studies are not available.
format Online
Article
Text
id pubmed-9961535
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99615352023-02-26 The Chemical Space of Terpenes: Insights from Data Science and AI Hosseini, Morteza Pereira, David M. Pharmaceuticals (Basel) Article Terpenes are a widespread class of natural products with significant chemical and biological diversity, and many of these molecules have already made their way into medicines. In this work, we employ a data science-based approach to identify, compile, and characterize the diversity of terpenes currently known in a systematic way, in a total of 59,833 molecules. We also employed several methods for the purpose of classifying terpene subclasses using their physicochemical descriptors. Light gradient boosting machine, k-nearest neighbours, random forests, Gaussian naïve Bayes and Multilayer perceptron were tested, with the best-performing algorithms yielding accuracy, F1 score, precision and other metrics all over 0.9, thus showing the capabilities of these approaches for the classification of terpene subclasses. These results can be important for the field of phytochemistry and pharmacognosy, as they allow the prediction of the subclass of novel terpene molecules, even when biosynthetic studies are not available. MDPI 2023-01-29 /pmc/articles/PMC9961535/ /pubmed/37259351 http://dx.doi.org/10.3390/ph16020202 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hosseini, Morteza
Pereira, David M.
The Chemical Space of Terpenes: Insights from Data Science and AI
title The Chemical Space of Terpenes: Insights from Data Science and AI
title_full The Chemical Space of Terpenes: Insights from Data Science and AI
title_fullStr The Chemical Space of Terpenes: Insights from Data Science and AI
title_full_unstemmed The Chemical Space of Terpenes: Insights from Data Science and AI
title_short The Chemical Space of Terpenes: Insights from Data Science and AI
title_sort chemical space of terpenes: insights from data science and ai
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9961535/
https://www.ncbi.nlm.nih.gov/pubmed/37259351
http://dx.doi.org/10.3390/ph16020202
work_keys_str_mv AT hosseinimorteza thechemicalspaceofterpenesinsightsfromdatascienceandai
AT pereiradavidm thechemicalspaceofterpenesinsightsfromdatascienceandai
AT hosseinimorteza chemicalspaceofterpenesinsightsfromdatascienceandai
AT pereiradavidm chemicalspaceofterpenesinsightsfromdatascienceandai