Cargando…
AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes
SummaryMicrobial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The di...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311338/ https://www.ncbi.nlm.nih.gov/pubmed/37387149 http://dx.doi.org/10.1093/bioinformatics/btad235 |
_version_ | 1785066722431074304 |
---|---|
author | Mongia, Mihir Baral, Romel Adduri, Abhinav Yan, Donghui Liu, Yudong Bian, Yuying Kim, Paul Behsaz, Bahar Mohimani, Hosein |
author_facet | Mongia, Mihir Baral, Romel Adduri, Abhinav Yan, Donghui Liu, Yudong Bian, Yuying Kim, Paul Behsaz, Bahar Mohimani, Hosein |
author_sort | Mongia, Mihir |
collection | PubMed |
description | SummaryMicrobial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453 560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups. |
format | Online Article Text |
id | pubmed-10311338 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-103113382023-07-01 AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes Mongia, Mihir Baral, Romel Adduri, Abhinav Yan, Donghui Liu, Yudong Bian, Yuying Kim, Paul Behsaz, Bahar Mohimani, Hosein Bioinformatics Bioinformatics of Microbes and Microbiomes SummaryMicrobial natural products represent a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class that include antibiotics, immunosuppressants, anticancer agents, toxins, siderophores, pigments, and cytostatics. The discovery of novel NRPs remains a laborious process because many NRPs consist of nonstandard amino acids that are assembled by nonribosomal peptide synthetases (NRPSs). Adenylation domains (A-domains) in NRPSs are responsible for selection and activation of monomers appearing in NRPs. During the past decade, several support vector machine-based algorithms have been developed for predicting the specificity of the monomers present in NRPs. These algorithms utilize physiochemical features of the amino acids present in the A-domains of NRPSs. In this article, we benchmarked the performance of various machine learning algorithms and features for predicting specificities of NRPSs and we showed that the extra trees model paired with one-hot encoding features outperforms the existing approaches. Moreover, we show that unsupervised clustering of 453 560 A-domains reveals many clusters that correspond to potentially novel amino acids. While it is challenging to predict the chemical structure of these amino acids, we developed novel techniques to predict their various properties, including polarity, hydrophobicity, charge, and presence of aromatic rings, carboxyl, and hydroxyl groups. Oxford University Press 2023-06-30 /pmc/articles/PMC10311338/ /pubmed/37387149 http://dx.doi.org/10.1093/bioinformatics/btad235 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Bioinformatics of Microbes and Microbiomes Mongia, Mihir Baral, Romel Adduri, Abhinav Yan, Donghui Liu, Yudong Bian, Yuying Kim, Paul Behsaz, Bahar Mohimani, Hosein AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes |
title | AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes |
title_full | AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes |
title_fullStr | AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes |
title_full_unstemmed | AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes |
title_short | AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes |
title_sort | adenpredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosynthetic gene clusters in microbial genomes |
topic | Bioinformatics of Microbes and Microbiomes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311338/ https://www.ncbi.nlm.nih.gov/pubmed/37387149 http://dx.doi.org/10.1093/bioinformatics/btad235 |
work_keys_str_mv | AT mongiamihir adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes AT baralromel adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes AT adduriabhinav adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes AT yandonghui adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes AT liuyudong adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes AT bianyuying adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes AT kimpaul adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes AT behsazbahar adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes AT mohimanihosein adenpredictoraccuratepredictionoftheadenylationdomainspecificityofnonribosomalpeptidebiosyntheticgeneclustersinmicrobialgenomes |