Cargando…

EpitopeVec: linear epitope prediction using deep protein sequence embeddings

MOTIVATION: B-cell epitopes (BCEs) play a pivotal role in the development of peptide vaccines, immuno-diagnostic reagents and antibody production, and thus in infectious disease prevention and diagnostics in general. Experimental methods used to determine BCEs are costly and time-consuming. Therefor...

Descripción completa

Detalles Bibliográficos
Autores principales: Bahai, Akash, Asgari, Ehsaneddin, Mofrad, Mohammad R K, Kloetgen, Andreas, McHardy, Alice C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8652027/
https://www.ncbi.nlm.nih.gov/pubmed/34180989
http://dx.doi.org/10.1093/bioinformatics/btab467
_version_ 1784611503238807552
author Bahai, Akash
Asgari, Ehsaneddin
Mofrad, Mohammad R K
Kloetgen, Andreas
McHardy, Alice C
author_facet Bahai, Akash
Asgari, Ehsaneddin
Mofrad, Mohammad R K
Kloetgen, Andreas
McHardy, Alice C
author_sort Bahai, Akash
collection PubMed
description MOTIVATION: B-cell epitopes (BCEs) play a pivotal role in the development of peptide vaccines, immuno-diagnostic reagents and antibody production, and thus in infectious disease prevention and diagnostics in general. Experimental methods used to determine BCEs are costly and time-consuming. Therefore, it is essential to develop computational methods for the rapid identification of BCEs. Although several computational methods have been developed for this task, generalizability is still a major concern, where cross-testing of the classifiers trained and tested on different datasets has revealed accuracies of 51–53%. RESULTS: We describe a new method called EpitopeVec, which uses a combination of residue properties, modified antigenicity scales, and protein language model-based representations (protein vectors) as features of peptides for linear BCE predictions. Extensive benchmarking of EpitopeVec and other state-of-the-art methods for linear BCE prediction on several large and small datasets, as well as cross-testing, demonstrated an improvement in the performance of EpitopeVec over other methods in terms of accuracy and area under the curve. As the predictive performance depended on the species origin of the respective antigens (viral, bacterial and eukaryotic), we also trained our method on a large viral dataset to create a dedicated linear viral BCE predictor with improved cross-testing performance. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/hzi-bifo/epitope-prediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8652027
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86520272021-12-08 EpitopeVec: linear epitope prediction using deep protein sequence embeddings Bahai, Akash Asgari, Ehsaneddin Mofrad, Mohammad R K Kloetgen, Andreas McHardy, Alice C Bioinformatics Original Papers MOTIVATION: B-cell epitopes (BCEs) play a pivotal role in the development of peptide vaccines, immuno-diagnostic reagents and antibody production, and thus in infectious disease prevention and diagnostics in general. Experimental methods used to determine BCEs are costly and time-consuming. Therefore, it is essential to develop computational methods for the rapid identification of BCEs. Although several computational methods have been developed for this task, generalizability is still a major concern, where cross-testing of the classifiers trained and tested on different datasets has revealed accuracies of 51–53%. RESULTS: We describe a new method called EpitopeVec, which uses a combination of residue properties, modified antigenicity scales, and protein language model-based representations (protein vectors) as features of peptides for linear BCE predictions. Extensive benchmarking of EpitopeVec and other state-of-the-art methods for linear BCE prediction on several large and small datasets, as well as cross-testing, demonstrated an improvement in the performance of EpitopeVec over other methods in terms of accuracy and area under the curve. As the predictive performance depended on the species origin of the respective antigens (viral, bacterial and eukaryotic), we also trained our method on a large viral dataset to create a dedicated linear viral BCE predictor with improved cross-testing performance. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/hzi-bifo/epitope-prediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-06-28 /pmc/articles/PMC8652027/ /pubmed/34180989 http://dx.doi.org/10.1093/bioinformatics/btab467 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Bahai, Akash
Asgari, Ehsaneddin
Mofrad, Mohammad R K
Kloetgen, Andreas
McHardy, Alice C
EpitopeVec: linear epitope prediction using deep protein sequence embeddings
title EpitopeVec: linear epitope prediction using deep protein sequence embeddings
title_full EpitopeVec: linear epitope prediction using deep protein sequence embeddings
title_fullStr EpitopeVec: linear epitope prediction using deep protein sequence embeddings
title_full_unstemmed EpitopeVec: linear epitope prediction using deep protein sequence embeddings
title_short EpitopeVec: linear epitope prediction using deep protein sequence embeddings
title_sort epitopevec: linear epitope prediction using deep protein sequence embeddings
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8652027/
https://www.ncbi.nlm.nih.gov/pubmed/34180989
http://dx.doi.org/10.1093/bioinformatics/btab467
work_keys_str_mv AT bahaiakash epitopeveclinearepitopepredictionusingdeepproteinsequenceembeddings
AT asgariehsaneddin epitopeveclinearepitopepredictionusingdeepproteinsequenceembeddings
AT mofradmohammadrk epitopeveclinearepitopepredictionusingdeepproteinsequenceembeddings
AT kloetgenandreas epitopeveclinearepitopepredictionusingdeepproteinsequenceembeddings
AT mchardyalicec epitopeveclinearepitopepredictionusingdeepproteinsequenceembeddings