Cargando…

iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures

Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-...

Descripción completa

Detalles Bibliográficos
Autores principales: Louwen, Joris J. R., Kautsar, Satria A., van der Burg, Sven, Medema, Marnix H., van der Hooft, Justin J. J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9946207/
https://www.ncbi.nlm.nih.gov/pubmed/36758069
http://dx.doi.org/10.1371/journal.pcbi.1010462
_version_ 1784892283257094144
author Louwen, Joris J. R.
Kautsar, Satria A.
van der Burg, Sven
Medema, Marnix H.
van der Hooft, Justin J. J.
author_facet Louwen, Joris J. R.
Kautsar, Satria A.
van der Burg, Sven
Medema, Marnix H.
van der Hooft, Justin J. J.
author_sort Louwen, Joris J. R.
collection PubMed
description Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-evolving genes called sub-clusters that are responsible for the biosynthesis of a specific chemical moiety in a natural product. Sub-clusters therefore provide an important link between the structures of a natural product and its BGC, which can be leveraged for predicting natural product structures from sequence, as well as for linking chemical structures and metabolomics-derived mass features to BGCs. While some initial computational methodologies have been devised for sub-cluster detection, current approaches are not scalable, have only been run on small and outdated datasets, or produce an impractically large number of possible sub-clusters to mine through. Here, we constructed a scalable method for unsupervised sub-cluster detection, called iPRESTO, based on topic modelling and statistical analysis of co-occurrence patterns of enzyme-coding protein families. iPRESTO was used to mine sub-clusters across 150,000 prokaryotic BGCs from antiSMASH-DB. After annotating a fraction of the resulting sub-cluster families, we could predict a substructure for 16% of the antiSMASH-DB BGCs. Additionally, our method was able to confirm 83% of the experimentally characterised sub-clusters in MIBiG reference BGCs. Based on iPRESTO-detected sub-clusters, we could correctly identify the BGCs for xenorhabdin and salbostatin biosynthesis (which had not yet been annotated in BGC databases), as well as propose a candidate BGC for akashin biosynthesis. Additionally, we show for a collection of 145 actinobacteria how substructures can aid in linking BGCs to molecules by correlating iPRESTO-detected sub-clusters to MS/MS-derived Mass2Motifs substructure patterns. This work paves the way for deeper functional and structural annotation of microbial BGCs by improved linking of orphan molecules to their cognate gene clusters, thus facilitating accelerated natural product discovery.
format Online
Article
Text
id pubmed-9946207
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-99462072023-02-23 iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures Louwen, Joris J. R. Kautsar, Satria A. van der Burg, Sven Medema, Marnix H. van der Hooft, Justin J. J. PLoS Comput Biol Research Article Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-evolving genes called sub-clusters that are responsible for the biosynthesis of a specific chemical moiety in a natural product. Sub-clusters therefore provide an important link between the structures of a natural product and its BGC, which can be leveraged for predicting natural product structures from sequence, as well as for linking chemical structures and metabolomics-derived mass features to BGCs. While some initial computational methodologies have been devised for sub-cluster detection, current approaches are not scalable, have only been run on small and outdated datasets, or produce an impractically large number of possible sub-clusters to mine through. Here, we constructed a scalable method for unsupervised sub-cluster detection, called iPRESTO, based on topic modelling and statistical analysis of co-occurrence patterns of enzyme-coding protein families. iPRESTO was used to mine sub-clusters across 150,000 prokaryotic BGCs from antiSMASH-DB. After annotating a fraction of the resulting sub-cluster families, we could predict a substructure for 16% of the antiSMASH-DB BGCs. Additionally, our method was able to confirm 83% of the experimentally characterised sub-clusters in MIBiG reference BGCs. Based on iPRESTO-detected sub-clusters, we could correctly identify the BGCs for xenorhabdin and salbostatin biosynthesis (which had not yet been annotated in BGC databases), as well as propose a candidate BGC for akashin biosynthesis. Additionally, we show for a collection of 145 actinobacteria how substructures can aid in linking BGCs to molecules by correlating iPRESTO-detected sub-clusters to MS/MS-derived Mass2Motifs substructure patterns. This work paves the way for deeper functional and structural annotation of microbial BGCs by improved linking of orphan molecules to their cognate gene clusters, thus facilitating accelerated natural product discovery. Public Library of Science 2023-02-09 /pmc/articles/PMC9946207/ /pubmed/36758069 http://dx.doi.org/10.1371/journal.pcbi.1010462 Text en © 2023 Louwen et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Louwen, Joris J. R.
Kautsar, Satria A.
van der Burg, Sven
Medema, Marnix H.
van der Hooft, Justin J. J.
iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures
title iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures
title_full iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures
title_fullStr iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures
title_full_unstemmed iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures
title_short iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures
title_sort ipresto: automated discovery of biosynthetic sub-clusters linked to specific natural product substructures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9946207/
https://www.ncbi.nlm.nih.gov/pubmed/36758069
http://dx.doi.org/10.1371/journal.pcbi.1010462
work_keys_str_mv AT louwenjorisjr iprestoautomateddiscoveryofbiosyntheticsubclusterslinkedtospecificnaturalproductsubstructures
AT kautsarsatriaa iprestoautomateddiscoveryofbiosyntheticsubclusterslinkedtospecificnaturalproductsubstructures
AT vanderburgsven iprestoautomateddiscoveryofbiosyntheticsubclusterslinkedtospecificnaturalproductsubstructures
AT medemamarnixh iprestoautomateddiscoveryofbiosyntheticsubclusterslinkedtospecificnaturalproductsubstructures
AT vanderhooftjustinjj iprestoautomateddiscoveryofbiosyntheticsubclusterslinkedtospecificnaturalproductsubstructures