Cargando…
Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets
An ongoing outbreak of coronavirus disease 2019 (COVID-19), caused by a single-stranded RNA virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused a worldwide pandemic that continues to date. Vaccination has proven to be the most effective technique, by far, for the tr...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8617960/ https://www.ncbi.nlm.nih.gov/pubmed/34829338 http://dx.doi.org/10.3390/diagnostics11111990 |
_version_ | 1784604632564105216 |
---|---|
author | Bukhari, Syed Nisar Hussain Jain, Amit Haq, Ehtishamul Mehbodniya, Abolfazl Webber, Julian |
author_facet | Bukhari, Syed Nisar Hussain Jain, Amit Haq, Ehtishamul Mehbodniya, Abolfazl Webber, Julian |
author_sort | Bukhari, Syed Nisar Hussain |
collection | PubMed |
description | An ongoing outbreak of coronavirus disease 2019 (COVID-19), caused by a single-stranded RNA virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused a worldwide pandemic that continues to date. Vaccination has proven to be the most effective technique, by far, for the treatment of COVID-19 and to combat the outbreak. Among all vaccine types, epitope-based peptide vaccines have received less attention and hold a large untapped potential for boosting vaccine safety and immunogenicity. Peptides used in such vaccine technology are chemically synthesized based on the amino acid sequences of antigenic proteins (T-cell epitopes) of the target pathogen. Using wet-lab experiments to identify antigenic proteins is very difficult, expensive, and time-consuming. We hereby propose an ensemble machine learning (ML) model for the prediction of T-cell epitopes (also known as immune relevant determinants or antigenic determinants) against SARS-CoV-2, utilizing physicochemical properties of amino acids. To train the model, we retrieved the experimentally determined SARS-CoV-2 T-cell epitopes from Immune Epitope Database and Analysis Resource (IEDB) repository. The model so developed achieved accuracy, AUC (Area under the ROC curve), Gini, specificity, sensitivity, F-score, and precision of 98.20%, 0.991, 0.994, 0.971, 0.982, 0.990, and 0.981, respectively, using a test set consisting of SARS-CoV-2 peptides (T-cell epitopes and non-epitopes) obtained from IEDB. The average accuracy of 97.98% was recorded in repeated 5-fold cross validation. Its comparison with 05 robust machine learning classifiers and existing T-cell epitope prediction techniques, such as NetMHC and CTLpred, suggest the proposed work as a better model. The predicted epitopes from the current model could possess a high probability to act as potential peptide vaccine candidates subjected to in vitro and in vivo scientific assessments. The model developed would help scientific community working in vaccine development save time to screen the active T-cell epitope candidates of SARS-CoV-2 against the inactive ones. |
format | Online Article Text |
id | pubmed-8617960 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-86179602021-11-27 Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets Bukhari, Syed Nisar Hussain Jain, Amit Haq, Ehtishamul Mehbodniya, Abolfazl Webber, Julian Diagnostics (Basel) Article An ongoing outbreak of coronavirus disease 2019 (COVID-19), caused by a single-stranded RNA virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused a worldwide pandemic that continues to date. Vaccination has proven to be the most effective technique, by far, for the treatment of COVID-19 and to combat the outbreak. Among all vaccine types, epitope-based peptide vaccines have received less attention and hold a large untapped potential for boosting vaccine safety and immunogenicity. Peptides used in such vaccine technology are chemically synthesized based on the amino acid sequences of antigenic proteins (T-cell epitopes) of the target pathogen. Using wet-lab experiments to identify antigenic proteins is very difficult, expensive, and time-consuming. We hereby propose an ensemble machine learning (ML) model for the prediction of T-cell epitopes (also known as immune relevant determinants or antigenic determinants) against SARS-CoV-2, utilizing physicochemical properties of amino acids. To train the model, we retrieved the experimentally determined SARS-CoV-2 T-cell epitopes from Immune Epitope Database and Analysis Resource (IEDB) repository. The model so developed achieved accuracy, AUC (Area under the ROC curve), Gini, specificity, sensitivity, F-score, and precision of 98.20%, 0.991, 0.994, 0.971, 0.982, 0.990, and 0.981, respectively, using a test set consisting of SARS-CoV-2 peptides (T-cell epitopes and non-epitopes) obtained from IEDB. The average accuracy of 97.98% was recorded in repeated 5-fold cross validation. Its comparison with 05 robust machine learning classifiers and existing T-cell epitope prediction techniques, such as NetMHC and CTLpred, suggest the proposed work as a better model. The predicted epitopes from the current model could possess a high probability to act as potential peptide vaccine candidates subjected to in vitro and in vivo scientific assessments. The model developed would help scientific community working in vaccine development save time to screen the active T-cell epitope candidates of SARS-CoV-2 against the inactive ones. MDPI 2021-10-26 /pmc/articles/PMC8617960/ /pubmed/34829338 http://dx.doi.org/10.3390/diagnostics11111990 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Bukhari, Syed Nisar Hussain Jain, Amit Haq, Ehtishamul Mehbodniya, Abolfazl Webber, Julian Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets |
title | Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets |
title_full | Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets |
title_fullStr | Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets |
title_full_unstemmed | Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets |
title_short | Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets |
title_sort | ensemble machine learning model to predict sars-cov-2 t-cell epitopes as potential vaccine targets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8617960/ https://www.ncbi.nlm.nih.gov/pubmed/34829338 http://dx.doi.org/10.3390/diagnostics11111990 |
work_keys_str_mv | AT bukharisyednisarhussain ensemblemachinelearningmodeltopredictsarscov2tcellepitopesaspotentialvaccinetargets AT jainamit ensemblemachinelearningmodeltopredictsarscov2tcellepitopesaspotentialvaccinetargets AT haqehtishamul ensemblemachinelearningmodeltopredictsarscov2tcellepitopesaspotentialvaccinetargets AT mehbodniyaabolfazl ensemblemachinelearningmodeltopredictsarscov2tcellepitopesaspotentialvaccinetargets AT webberjulian ensemblemachinelearningmodeltopredictsarscov2tcellepitopesaspotentialvaccinetargets |