Cargando…

Bacterial Immunogenicity Prediction by Machine Learning Methods

The identification of protective immunogens is the most important and vigorous initial step in the long-lasting and expensive process of vaccine design and development. Machine learning (ML) methods are very effective in data mining and in the analysis of big data such as microbial proteomes. They a...

Descripción completa

Detalles Bibliográficos
Autores principales: Dimitrov, Ivan, Zaharieva, Nevena, Doytchinova, Irini
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7711804/
https://www.ncbi.nlm.nih.gov/pubmed/33265930
http://dx.doi.org/10.3390/vaccines8040709
_version_ 1783618226790334464
author Dimitrov, Ivan
Zaharieva, Nevena
Doytchinova, Irini
author_facet Dimitrov, Ivan
Zaharieva, Nevena
Doytchinova, Irini
author_sort Dimitrov, Ivan
collection PubMed
description The identification of protective immunogens is the most important and vigorous initial step in the long-lasting and expensive process of vaccine design and development. Machine learning (ML) methods are very effective in data mining and in the analysis of big data such as microbial proteomes. They are able to significantly reduce the experimental work for discovering novel vaccine candidates. Here, we applied six supervised ML methods (partial least squares-based discriminant analysis, k nearest neighbor (kNN), random forest (RF), support vector machine (SVM), random subspace method (RSM), and extreme gradient boosting) on a set of 317 known bacterial immunogens and 317 bacterial non-immunogens and derived models for immunogenicity prediction. The models were validated by internal cross-validation in 10 groups from the training set and by the external test set. All of them showed good predictive ability, but the xgboost model displays the most prominent ability to identify immunogens by recognizing 84% of the known immunogens in the test set. The combined RSM-kNN model was the best in the recognition of non-immunogens, identifying 92% of them in the test set. The three best performing ML models (xgboost, RSM-kNN, and RF) were implemented in the new version of the server VaxiJen, and the prediction of bacterial immunogens is now based on majority voting.
format Online
Article
Text
id pubmed-7711804
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-77118042020-12-04 Bacterial Immunogenicity Prediction by Machine Learning Methods Dimitrov, Ivan Zaharieva, Nevena Doytchinova, Irini Vaccines (Basel) Article The identification of protective immunogens is the most important and vigorous initial step in the long-lasting and expensive process of vaccine design and development. Machine learning (ML) methods are very effective in data mining and in the analysis of big data such as microbial proteomes. They are able to significantly reduce the experimental work for discovering novel vaccine candidates. Here, we applied six supervised ML methods (partial least squares-based discriminant analysis, k nearest neighbor (kNN), random forest (RF), support vector machine (SVM), random subspace method (RSM), and extreme gradient boosting) on a set of 317 known bacterial immunogens and 317 bacterial non-immunogens and derived models for immunogenicity prediction. The models were validated by internal cross-validation in 10 groups from the training set and by the external test set. All of them showed good predictive ability, but the xgboost model displays the most prominent ability to identify immunogens by recognizing 84% of the known immunogens in the test set. The combined RSM-kNN model was the best in the recognition of non-immunogens, identifying 92% of them in the test set. The three best performing ML models (xgboost, RSM-kNN, and RF) were implemented in the new version of the server VaxiJen, and the prediction of bacterial immunogens is now based on majority voting. MDPI 2020-11-30 /pmc/articles/PMC7711804/ /pubmed/33265930 http://dx.doi.org/10.3390/vaccines8040709 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dimitrov, Ivan
Zaharieva, Nevena
Doytchinova, Irini
Bacterial Immunogenicity Prediction by Machine Learning Methods
title Bacterial Immunogenicity Prediction by Machine Learning Methods
title_full Bacterial Immunogenicity Prediction by Machine Learning Methods
title_fullStr Bacterial Immunogenicity Prediction by Machine Learning Methods
title_full_unstemmed Bacterial Immunogenicity Prediction by Machine Learning Methods
title_short Bacterial Immunogenicity Prediction by Machine Learning Methods
title_sort bacterial immunogenicity prediction by machine learning methods
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7711804/
https://www.ncbi.nlm.nih.gov/pubmed/33265930
http://dx.doi.org/10.3390/vaccines8040709
work_keys_str_mv AT dimitrovivan bacterialimmunogenicitypredictionbymachinelearningmethods
AT zaharievanevena bacterialimmunogenicitypredictionbymachinelearningmethods
AT doytchinovairini bacterialimmunogenicitypredictionbymachinelearningmethods