Cargando…

Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning

Fungal secondary metabolites (SMs) play a significant role in the diversity of ecological communities, niches, and lifestyles in the fungal kingdom. Many fungal SMs have medically and industrially important properties including antifungal, antibacterial, and antitumor activity, and a single metaboli...

Descripción completa

Detalles Bibliográficos
Autores principales: Riedling, Olivia, Walker, Allison S., Rokas, Antonis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515863/
https://www.ncbi.nlm.nih.gov/pubmed/37745539
http://dx.doi.org/10.1101/2023.09.12.557468
_version_ 1785109035055316992
author Riedling, Olivia
Walker, Allison S.
Rokas, Antonis
author_facet Riedling, Olivia
Walker, Allison S.
Rokas, Antonis
author_sort Riedling, Olivia
collection PubMed
description Fungal secondary metabolites (SMs) play a significant role in the diversity of ecological communities, niches, and lifestyles in the fungal kingdom. Many fungal SMs have medically and industrially important properties including antifungal, antibacterial, and antitumor activity, and a single metabolite can display multiple types of bioactivities. The genes necessary for fungal SM biosynthesis are typically found in a single genomic region forming biosynthetic gene clusters (BGCs). However, whether fungal SM bioactivity can be predicted from specific attributes of genes in BGCs remains an open question. We adapted previously used machine learning models for predicting SM bioactivity from bacterial BGC data to fungal BGC data. We trained our models to predict antibacterial, antifungal, and cytotoxic/antitumor bioactivity on two datasets: 1) fungal BGCs (dataset comprised of 314 BGCs), and 2) fungal (314 BGCs) and bacterial BGCs (1,003 BGCs); the second dataset was our control since a previous study using just the bacterial BGC data yielded prediction accuracies as high as 80%. We found that the models trained only on fungal BGCs had balanced accuracies between 51-68%, whereas training on bacterial and fungal BGCs yielded balanced accuracies between 61-74%. The lower accuracy of the predictions from fungal data likely stems from the small number of BGCs and SMs with known bioactivity; this lack of data currently limits the application of machine learning approaches in studying fungal secondary metabolism. However, our data also suggest that machine learning approaches trained on bacterial and fungal data can predict SM bioactivity with good accuracy. With more than 15,000 characterized fungal SMs, millions of putative BGCs present in fungal genomes, and increased demand for novel drugs, efforts that systematically link fungal SM bioactivity to BGCs are urgently needed.
format Online
Article
Text
id pubmed-10515863
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-105158632023-09-23 Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning Riedling, Olivia Walker, Allison S. Rokas, Antonis bioRxiv Article Fungal secondary metabolites (SMs) play a significant role in the diversity of ecological communities, niches, and lifestyles in the fungal kingdom. Many fungal SMs have medically and industrially important properties including antifungal, antibacterial, and antitumor activity, and a single metabolite can display multiple types of bioactivities. The genes necessary for fungal SM biosynthesis are typically found in a single genomic region forming biosynthetic gene clusters (BGCs). However, whether fungal SM bioactivity can be predicted from specific attributes of genes in BGCs remains an open question. We adapted previously used machine learning models for predicting SM bioactivity from bacterial BGC data to fungal BGC data. We trained our models to predict antibacterial, antifungal, and cytotoxic/antitumor bioactivity on two datasets: 1) fungal BGCs (dataset comprised of 314 BGCs), and 2) fungal (314 BGCs) and bacterial BGCs (1,003 BGCs); the second dataset was our control since a previous study using just the bacterial BGC data yielded prediction accuracies as high as 80%. We found that the models trained only on fungal BGCs had balanced accuracies between 51-68%, whereas training on bacterial and fungal BGCs yielded balanced accuracies between 61-74%. The lower accuracy of the predictions from fungal data likely stems from the small number of BGCs and SMs with known bioactivity; this lack of data currently limits the application of machine learning approaches in studying fungal secondary metabolism. However, our data also suggest that machine learning approaches trained on bacterial and fungal data can predict SM bioactivity with good accuracy. With more than 15,000 characterized fungal SMs, millions of putative BGCs present in fungal genomes, and increased demand for novel drugs, efforts that systematically link fungal SM bioactivity to BGCs are urgently needed. Cold Spring Harbor Laboratory 2023-09-12 /pmc/articles/PMC10515863/ /pubmed/37745539 http://dx.doi.org/10.1101/2023.09.12.557468 Text en https://creativecommons.org/licenses/by-nc/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Riedling, Olivia
Walker, Allison S.
Rokas, Antonis
Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning
title Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning
title_full Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning
title_fullStr Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning
title_full_unstemmed Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning
title_short Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning
title_sort predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515863/
https://www.ncbi.nlm.nih.gov/pubmed/37745539
http://dx.doi.org/10.1101/2023.09.12.557468
work_keys_str_mv AT riedlingolivia predictingfungalsecondarymetaboliteactivityfrombiosyntheticgeneclusterdatausingmachinelearning
AT walkerallisons predictingfungalsecondarymetaboliteactivityfrombiosyntheticgeneclusterdatausingmachinelearning
AT rokasantonis predictingfungalsecondarymetaboliteactivityfrombiosyntheticgeneclusterdatausingmachinelearning