Cargando…

MP4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets

Bacteria can exceptionally evolve and develop pathogenic features making it crucial to determine novel pathogenic proteins for specific therapeutic interventions. Therefore, we have developed a machine-learning tool that predicts and functionally classifies pathogenic proteins into their respective...

Descripción completa

Detalles Bibliográficos
Autores principales: Gupta, Ankit, Malwe, Aditya S., Srivastava, Gopal N., Thoudam, Parikshit, Hibare, Keshav, Sharma, Vineet K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9703692/
https://www.ncbi.nlm.nih.gov/pubmed/36443666
http://dx.doi.org/10.1186/s12859-022-05061-7
Descripción
Sumario:Bacteria can exceptionally evolve and develop pathogenic features making it crucial to determine novel pathogenic proteins for specific therapeutic interventions. Therefore, we have developed a machine-learning tool that predicts and functionally classifies pathogenic proteins into their respective pathogenic classes. Through construction of pathogenic proteins database and optimization of ML algorithms, Support Vector Machine was selected for the model construction. The developed SVM classifier yielded an accuracy of 81.72% on the blind-dataset and classified the proteins into three classes: Non-pathogenic proteins (Class-1), Antibiotic Resistance Proteins and Toxins (Class-2), and Secretory System Associated and capsular proteins (Class-3). The classifier provided an accuracy of 79% on real dataset-1, and 72% on real dataset-2. Based on the probability of prediction, users can estimate the pathogenicity and annotation of proteins under scrutiny. Tool will provide accurate prediction of pathogenic proteins in genomic and metagenomic datasets providing leads for experimental validations. Tool is available at: http://metagenomics.iiserb.ac.in/mp4. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05061-7.