Cargando…

AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes

SummaryMicrobial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The di...

Descripción completa

Detalles Bibliográficos
Autores principales: Mongia, Mihir, Baral, Romel, Adduri, Abhinav, Yan, Donghui, Liu, Yudong, Bian, Yuying, Kim, Paul, Behsaz, Bahar, Mohimani, Hosein
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311338/
https://www.ncbi.nlm.nih.gov/pubmed/37387149
http://dx.doi.org/10.1093/bioinformatics/btad235
_version_ 1785066722431074304
author Mongia, Mihir
Baral, Romel
Adduri, Abhinav
Yan, Donghui
Liu, Yudong
Bian, Yuying
Kim, Paul
Behsaz, Bahar
Mohimani, Hosein
author_facet Mongia, Mihir
Baral, Romel
Adduri, Abhinav
Yan, Donghui
Liu, Yudong
Bian, Yuying
Kim, Paul
Behsaz, Bahar
Mohimani, Hosein
author_sort Mongia, Mihir
collection PubMed
description SummaryMicrobial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453 560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups.
format Online
Article
Text
id pubmed-10311338
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103113382023-07-01 AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes Mongia, Mihir Baral, Romel Adduri, Abhinav Yan, Donghui Liu, Yudong Bian, Yuying Kim, Paul Behsaz, Bahar Mohimani, Hosein Bioinformatics Bioinformatics of Microbes and Microbiomes SummaryMicrobial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453 560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups. Oxford University Press 2023-06-30 /pmc/articles/PMC10311338/ /pubmed/37387149 http://dx.doi.org/10.1093/bioinformatics/btad235 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Bioinformatics of Microbes and Microbiomes
Mongia, Mihir
Baral, Romel
Adduri, Abhinav
Yan, Donghui
Liu, Yudong
Bian, Yuying
Kim, Paul
Behsaz, Bahar
Mohimani, Hosein
AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes
title AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes
title_full AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes
title_fullStr AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes
title_full_unstemmed AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes
title_short AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes
title_sort adenpredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes
topic Bioinformatics of Microbes and Microbiomes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311338/
https://www.ncbi.nlm.nih.gov/pubmed/37387149
http://dx.doi.org/10.1093/bioinformatics/btad235
work_keys_str_mv AT mongiamihir adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes
AT baralromel adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes
AT adduriabhinav adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes
AT yandonghui adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes
AT liuyudong adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes
AT bianyuying adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes
AT kimpaul adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes
AT behsazbahar adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes
AT mohimanihosein adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes