Cargando…

MP4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets

Bacteria can exceptionally evolve and develop pathogenic features making it crucial to determine novel pathogenic proteins for specific therapeutic interventions. Therefore, we have developed a machine-learning tool that predicts and functionally classifies pathogenic proteins into their respective...

Descripción completa

Detalles Bibliográficos
Autores principales: Gupta, Ankit, Malwe, Aditya S., Srivastava, Gopal N., Thoudam, Parikshit, Hibare, Keshav, Sharma, Vineet K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9703692/
https://www.ncbi.nlm.nih.gov/pubmed/36443666
http://dx.doi.org/10.1186/s12859-022-05061-7
_version_ 1784839907944955904
author Gupta, Ankit
Malwe, Aditya S.
Srivastava, Gopal N.
Thoudam, Parikshit
Hibare, Keshav
Sharma, Vineet K.
author_facet Gupta, Ankit
Malwe, Aditya S.
Srivastava, Gopal N.
Thoudam, Parikshit
Hibare, Keshav
Sharma, Vineet K.
author_sort Gupta, Ankit
collection PubMed
description Bacteria can exceptionally evolve and develop pathogenic features making it crucial to determine novel pathogenic proteins for specific therapeutic interventions. Therefore, we have developed a machine-learning tool that predicts and functionally classifies pathogenic proteins into their respective pathogenic classes. Through construction of pathogenic proteins database and optimization of ML algorithms, Support Vector Machine was selected for the model construction. The developed SVM classifier yielded an accuracy of 81.72% on the blind-dataset and classified the proteins into three classes: Non-pathogenic proteins (Class-1), Antibiotic Resistance Proteins and Toxins (Class-2), and Secretory System Associated and capsular proteins (Class-3). The classifier provided an accuracy of 79% on real dataset-1, and 72% on real dataset-2. Based on the probability of prediction, users can estimate the pathogenicity and annotation of proteins under scrutiny. Tool will provide accurate prediction of pathogenic proteins in genomic and metagenomic datasets providing leads for experimental validations. Tool is available at: http://metagenomics.iiserb.ac.in/mp4. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05061-7.
format Online
Article
Text
id pubmed-9703692
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-97036922022-11-29 MP4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets Gupta, Ankit Malwe, Aditya S. Srivastava, Gopal N. Thoudam, Parikshit Hibare, Keshav Sharma, Vineet K. BMC Bioinformatics Research Bacteria can exceptionally evolve and develop pathogenic features making it crucial to determine novel pathogenic proteins for specific therapeutic interventions. Therefore, we have developed a machine-learning tool that predicts and functionally classifies pathogenic proteins into their respective pathogenic classes. Through construction of pathogenic proteins database and optimization of ML algorithms, Support Vector Machine was selected for the model construction. The developed SVM classifier yielded an accuracy of 81.72% on the blind-dataset and classified the proteins into three classes: Non-pathogenic proteins (Class-1), Antibiotic Resistance Proteins and Toxins (Class-2), and Secretory System Associated and capsular proteins (Class-3). The classifier provided an accuracy of 79% on real dataset-1, and 72% on real dataset-2. Based on the probability of prediction, users can estimate the pathogenicity and annotation of proteins under scrutiny. Tool will provide accurate prediction of pathogenic proteins in genomic and metagenomic datasets providing leads for experimental validations. Tool is available at: http://metagenomics.iiserb.ac.in/mp4. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05061-7. BioMed Central 2022-11-28 /pmc/articles/PMC9703692/ /pubmed/36443666 http://dx.doi.org/10.1186/s12859-022-05061-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Gupta, Ankit
Malwe, Aditya S.
Srivastava, Gopal N.
Thoudam, Parikshit
Hibare, Keshav
Sharma, Vineet K.
MP4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets
title MP4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets
title_full MP4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets
title_fullStr MP4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets
title_full_unstemmed MP4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets
title_short MP4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets
title_sort mp4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9703692/
https://www.ncbi.nlm.nih.gov/pubmed/36443666
http://dx.doi.org/10.1186/s12859-022-05061-7
work_keys_str_mv AT guptaankit mp4amachinelearningbasedclassificationtoolforpredictionandfunctionalannotationofpathogenicproteinsfrommetagenomicandgenomicdatasets
AT malweadityas mp4amachinelearningbasedclassificationtoolforpredictionandfunctionalannotationofpathogenicproteinsfrommetagenomicandgenomicdatasets
AT srivastavagopaln mp4amachinelearningbasedclassificationtoolforpredictionandfunctionalannotationofpathogenicproteinsfrommetagenomicandgenomicdatasets
AT thoudamparikshit mp4amachinelearningbasedclassificationtoolforpredictionandfunctionalannotationofpathogenicproteinsfrommetagenomicandgenomicdatasets
AT hibarekeshav mp4amachinelearningbasedclassificationtoolforpredictionandfunctionalannotationofpathogenicproteinsfrommetagenomicandgenomicdatasets
AT sharmavineetk mp4amachinelearningbasedclassificationtoolforpredictionandfunctionalannotationofpathogenicproteinsfrommetagenomicandgenomicdatasets