Cargando…

NPOmix: A machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters

Microbial specialized metabolites are an important source of and inspiration for many pharmaceuticals, biotechnological products and play key roles in ecological processes. Untargeted metabolomics using liquid chromatography coupled with tandem mass spectrometry is an efficient technique to access m...

Descripción completa

Detalles Bibliográficos
Autores principales: Leão, Tiago F, Wang, Mingxun, da Silva, Ricardo, Gurevich, Alexey, Bauermeister, Anelize, Gomes, Paulo Wender P, Brejnrod, Asker, Glukhov, Evgenia, Aron, Allegra T, Louwen, Joris J R, Kim, Hyun Woo, Reher, Raphael, Fiore, Marli F, van der Hooft, Justin J J, Gerwick, Lena, Gerwick, William H, Bandeira, Nuno, Dorrestein, Pieter C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9802219/
https://www.ncbi.nlm.nih.gov/pubmed/36712343
http://dx.doi.org/10.1093/pnasnexus/pgac257
_version_ 1784861637438603264
author Leão, Tiago F
Wang, Mingxun
da Silva, Ricardo
Gurevich, Alexey
Bauermeister, Anelize
Gomes, Paulo Wender P
Brejnrod, Asker
Glukhov, Evgenia
Aron, Allegra T
Louwen, Joris J R
Kim, Hyun Woo
Reher, Raphael
Fiore, Marli F
van der Hooft, Justin J J
Gerwick, Lena
Gerwick, William H
Bandeira, Nuno
Dorrestein, Pieter C
author_facet Leão, Tiago F
Wang, Mingxun
da Silva, Ricardo
Gurevich, Alexey
Bauermeister, Anelize
Gomes, Paulo Wender P
Brejnrod, Asker
Glukhov, Evgenia
Aron, Allegra T
Louwen, Joris J R
Kim, Hyun Woo
Reher, Raphael
Fiore, Marli F
van der Hooft, Justin J J
Gerwick, Lena
Gerwick, William H
Bandeira, Nuno
Dorrestein, Pieter C
author_sort Leão, Tiago F
collection PubMed
description Microbial specialized metabolites are an important source of and inspiration for many pharmaceuticals, biotechnological products and play key roles in ecological processes. Untargeted metabolomics using liquid chromatography coupled with tandem mass spectrometry is an efficient technique to access metabolites from fractions and even environmental crude extracts. Nevertheless, metabolomics is limited in predicting structures or bioactivities for cryptic metabolites. Efficiently linking the biosynthetic potential inferred from (meta)genomics to the specialized metabolome would accelerate drug discovery programs by allowing metabolomics to make use of genetic predictions. Here, we present a k-nearest neighbor classifier to systematically connect mass spectrometry fragmentation spectra to their corresponding biosynthetic gene clusters (independent of their chemical class). Our new pattern-based genome mining pipeline links biosynthetic genes to metabolites that they encode for, as detected via mass spectrometry from bacterial cultures or environmental microbiomes. Using paired datasets that include validated genes-mass spectral links from the Paired Omics Data Platform, we demonstrate this approach by automatically linking 18 previously known mass spectra (17 for which the biosynthesis gene clusters can be found at the MIBiG database plus palmyramide A) to their corresponding previously experimentally validated biosynthetic genes (e.g., via nuclear magnetic resonance or genetic engineering). We illustrated a computational example of how to use our Natural Products Mixed Omics (NPOmix) tool for siderophore mining that can be reproduced by the users. We conclude that NPOmix minimizes the need for culturing (it worked well on microbiomes) and facilitates specialized metabolite prioritization based on integrative omics mining.
format Online
Article
Text
id pubmed-9802219
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98022192023-01-26 NPOmix: A machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters Leão, Tiago F Wang, Mingxun da Silva, Ricardo Gurevich, Alexey Bauermeister, Anelize Gomes, Paulo Wender P Brejnrod, Asker Glukhov, Evgenia Aron, Allegra T Louwen, Joris J R Kim, Hyun Woo Reher, Raphael Fiore, Marli F van der Hooft, Justin J J Gerwick, Lena Gerwick, William H Bandeira, Nuno Dorrestein, Pieter C PNAS Nexus Biological, Health, and Medical Sciences Microbial specialized metabolites are an important source of and inspiration for many pharmaceuticals, biotechnological products and play key roles in ecological processes. Untargeted metabolomics using liquid chromatography coupled with tandem mass spectrometry is an efficient technique to access metabolites from fractions and even environmental crude extracts. Nevertheless, metabolomics is limited in predicting structures or bioactivities for cryptic metabolites. Efficiently linking the biosynthetic potential inferred from (meta)genomics to the specialized metabolome would accelerate drug discovery programs by allowing metabolomics to make use of genetic predictions. Here, we present a k-nearest neighbor classifier to systematically connect mass spectrometry fragmentation spectra to their corresponding biosynthetic gene clusters (independent of their chemical class). Our new pattern-based genome mining pipeline links biosynthetic genes to metabolites that they encode for, as detected via mass spectrometry from bacterial cultures or environmental microbiomes. Using paired datasets that include validated genes-mass spectral links from the Paired Omics Data Platform, we demonstrate this approach by automatically linking 18 previously known mass spectra (17 for which the biosynthesis gene clusters can be found at the MIBiG database plus palmyramide A) to their corresponding previously experimentally validated biosynthetic genes (e.g., via nuclear magnetic resonance or genetic engineering). We illustrated a computational example of how to use our Natural Products Mixed Omics (NPOmix) tool for siderophore mining that can be reproduced by the users. We conclude that NPOmix minimizes the need for culturing (it worked well on microbiomes) and facilitates specialized metabolite prioritization based on integrative omics mining. Oxford University Press 2022-11-16 /pmc/articles/PMC9802219/ /pubmed/36712343 http://dx.doi.org/10.1093/pnasnexus/pgac257 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of National Academy of Sciences. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Biological, Health, and Medical Sciences
Leão, Tiago F
Wang, Mingxun
da Silva, Ricardo
Gurevich, Alexey
Bauermeister, Anelize
Gomes, Paulo Wender P
Brejnrod, Asker
Glukhov, Evgenia
Aron, Allegra T
Louwen, Joris J R
Kim, Hyun Woo
Reher, Raphael
Fiore, Marli F
van der Hooft, Justin J J
Gerwick, Lena
Gerwick, William H
Bandeira, Nuno
Dorrestein, Pieter C
NPOmix: A machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters
title NPOmix: A machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters
title_full NPOmix: A machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters
title_fullStr NPOmix: A machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters
title_full_unstemmed NPOmix: A machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters
title_short NPOmix: A machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters
title_sort npomix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters
topic Biological, Health, and Medical Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9802219/
https://www.ncbi.nlm.nih.gov/pubmed/36712343
http://dx.doi.org/10.1093/pnasnexus/pgac257
work_keys_str_mv AT leaotiagof npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT wangmingxun npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT dasilvaricardo npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT gurevichalexey npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT bauermeisteranelize npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT gomespaulowenderp npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT brejnrodasker npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT glukhovevgenia npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT aronallegrat npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT louwenjorisjr npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT kimhyunwoo npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT reherraphael npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT fioremarlif npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT vanderhooftjustinjj npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT gerwicklena npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT gerwickwilliamh npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT bandeiranuno npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters
AT dorresteinpieterc npomixamachinelearningclassifiertoconnectmassspectrometryfragmentationdatatobiosyntheticgeneclusters