Cargando…

iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction

Identification of B-cell epitopes (BCEs) is a fundamental step for epitope-based vaccine development, antibody production, and disease prevention and diagnosis. Due to the avalanche of protein sequence data discovered in postgenomic age, it is essential to develop an automated computational method t...

Descripción completa

Detalles Bibliográficos
Autores principales: Manavalan, Balachandran, Govindaraj, Rajiv Gandhi, Shin, Tae Hwan, Kim, Myeong Ok, Lee, Gwang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6072840/
https://www.ncbi.nlm.nih.gov/pubmed/30100904
http://dx.doi.org/10.3389/fimmu.2018.01695
_version_ 1783344069284462592
author Manavalan, Balachandran
Govindaraj, Rajiv Gandhi
Shin, Tae Hwan
Kim, Myeong Ok
Lee, Gwang
author_facet Manavalan, Balachandran
Govindaraj, Rajiv Gandhi
Shin, Tae Hwan
Kim, Myeong Ok
Lee, Gwang
author_sort Manavalan, Balachandran
collection PubMed
description Identification of B-cell epitopes (BCEs) is a fundamental step for epitope-based vaccine development, antibody production, and disease prevention and diagnosis. Due to the avalanche of protein sequence data discovered in postgenomic age, it is essential to develop an automated computational method to enable fast and accurate identification of novel BCEs within vast number of candidate proteins and peptides. Although several computational methods have been developed, their accuracy is unreliable. Thus, developing a reliable model with significant prediction improvements is highly desirable. In this study, we first constructed a non-redundant data set of 5,550 experimentally validated BCEs and 6,893 non-BCEs from the Immune Epitope Database. We then developed a novel ensemble learning framework for improved linear BCE predictor called iBCE-EL, a fusion of two independent predictors, namely, extremely randomized tree (ERT) and gradient boosting (GB) classifiers, which, respectively, uses a combination of physicochemical properties (PCP) and amino acid composition and a combination of dipeptide and PCP as input features. Cross-validation analysis on a benchmarking data set showed that iBCE-EL performed better than individual classifiers (ERT and GB), with a Matthews correlation coefficient (MCC) of 0.454. Furthermore, we evaluated the performance of iBCE-EL on the independent data set. Results show that iBCE-EL significantly outperformed the state-of-the-art method with an MCC of 0.463. To the best of our knowledge, iBCE-EL is the first ensemble method for linear BCEs prediction. iBCE-EL was implemented in a web-based platform, which is available at http://thegleelab.org/iBCE-EL. iBCE-EL contains two prediction modes. The first one identifying peptide sequences as BCEs or non-BCEs, while later one is aimed at providing users with the option of mining potential BCEs from protein sequences.
format Online
Article
Text
id pubmed-6072840
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-60728402018-08-10 iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction Manavalan, Balachandran Govindaraj, Rajiv Gandhi Shin, Tae Hwan Kim, Myeong Ok Lee, Gwang Front Immunol Immunology Identification of B-cell epitopes (BCEs) is a fundamental step for epitope-based vaccine development, antibody production, and disease prevention and diagnosis. Due to the avalanche of protein sequence data discovered in postgenomic age, it is essential to develop an automated computational method to enable fast and accurate identification of novel BCEs within vast number of candidate proteins and peptides. Although several computational methods have been developed, their accuracy is unreliable. Thus, developing a reliable model with significant prediction improvements is highly desirable. In this study, we first constructed a non-redundant data set of 5,550 experimentally validated BCEs and 6,893 non-BCEs from the Immune Epitope Database. We then developed a novel ensemble learning framework for improved linear BCE predictor called iBCE-EL, a fusion of two independent predictors, namely, extremely randomized tree (ERT) and gradient boosting (GB) classifiers, which, respectively, uses a combination of physicochemical properties (PCP) and amino acid composition and a combination of dipeptide and PCP as input features. Cross-validation analysis on a benchmarking data set showed that iBCE-EL performed better than individual classifiers (ERT and GB), with a Matthews correlation coefficient (MCC) of 0.454. Furthermore, we evaluated the performance of iBCE-EL on the independent data set. Results show that iBCE-EL significantly outperformed the state-of-the-art method with an MCC of 0.463. To the best of our knowledge, iBCE-EL is the first ensemble method for linear BCEs prediction. iBCE-EL was implemented in a web-based platform, which is available at http://thegleelab.org/iBCE-EL. iBCE-EL contains two prediction modes. The first one identifying peptide sequences as BCEs or non-BCEs, while later one is aimed at providing users with the option of mining potential BCEs from protein sequences. Frontiers Media S.A. 2018-07-27 /pmc/articles/PMC6072840/ /pubmed/30100904 http://dx.doi.org/10.3389/fimmu.2018.01695 Text en Copyright © 2018 Manavalan, Govindaraj, Shin, Kim and Lee. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Immunology
Manavalan, Balachandran
Govindaraj, Rajiv Gandhi
Shin, Tae Hwan
Kim, Myeong Ok
Lee, Gwang
iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction
title iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction
title_full iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction
title_fullStr iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction
title_full_unstemmed iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction
title_short iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction
title_sort ibce-el: a new ensemble learning framework for improved linear b-cell epitope prediction
topic Immunology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6072840/
https://www.ncbi.nlm.nih.gov/pubmed/30100904
http://dx.doi.org/10.3389/fimmu.2018.01695
work_keys_str_mv AT manavalanbalachandran ibceelanewensemblelearningframeworkforimprovedlinearbcellepitopeprediction
AT govindarajrajivgandhi ibceelanewensemblelearningframeworkforimprovedlinearbcellepitopeprediction
AT shintaehwan ibceelanewensemblelearningframeworkforimprovedlinearbcellepitopeprediction
AT kimmyeongok ibceelanewensemblelearningframeworkforimprovedlinearbcellepitopeprediction
AT leegwang ibceelanewensemblelearningframeworkforimprovedlinearbcellepitopeprediction