Cargando…
Positive-unlabeled learning for the prediction of conformational B-cell epitopes
BACKGROUND: The incomplete ground truth of training data of B-cell epitopes is a demanding issue in computational epitope prediction. The challenge is that only a small fraction of the surface residues of an antigen are confirmed as antigenic residues (positive training data); the remaining residues...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4682424/ https://www.ncbi.nlm.nih.gov/pubmed/26681157 http://dx.doi.org/10.1186/1471-2105-16-S18-S12 |
_version_ | 1782405888076677120 |
---|---|
author | Ren, Jing Liu, Qian Ellis, John Li, Jinyan |
author_facet | Ren, Jing Liu, Qian Ellis, John Li, Jinyan |
author_sort | Ren, Jing |
collection | PubMed |
description | BACKGROUND: The incomplete ground truth of training data of B-cell epitopes is a demanding issue in computational epitope prediction. The challenge is that only a small fraction of the surface residues of an antigen are confirmed as antigenic residues (positive training data); the remaining residues are unlabeled. As some of these uncertain residues can possibly be grouped to form novel but currently unknown epitopes, it is misguided to unanimously classify all the unlabeled residues as negative training data following the traditional supervised learning scheme. RESULTS: We propose a positive-unlabeled learning algorithm to address this problem. The key idea is to distinguish between epitope-likely residues and reliable negative residues in unlabeled data. The method has two steps: (1) identify reliable negative residues using a weighted SVM with a high recall; and (2) construct a classification model on the positive residues and the reliable negative residues. Complex-based 10-fold cross-validation was conducted to show that this method outperforms those commonly used predictors DiscoTope 2.0, ElliPro and SEPPA 2.0 in every aspect. We conducted four case studies, in which the approach was tested on antigens of West Nile virus, dihydrofolate reductase, beta-lactamase, and two Ebola antigens whose epitopes are currently unknown. All the results were assessed on a newly-established data set of antigen structures not bound by antibodies, instead of on antibody-bound antigen structures. These bound structures may contain unfair binding information such as bound-state B-factors and protrusion index which could exaggerate the epitope prediction performance. Source codes are available on request. |
format | Online Article Text |
id | pubmed-4682424 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46824242015-12-21 Positive-unlabeled learning for the prediction of conformational B-cell epitopes Ren, Jing Liu, Qian Ellis, John Li, Jinyan BMC Bioinformatics Research BACKGROUND: The incomplete ground truth of training data of B-cell epitopes is a demanding issue in computational epitope prediction. The challenge is that only a small fraction of the surface residues of an antigen are confirmed as antigenic residues (positive training data); the remaining residues are unlabeled. As some of these uncertain residues can possibly be grouped to form novel but currently unknown epitopes, it is misguided to unanimously classify all the unlabeled residues as negative training data following the traditional supervised learning scheme. RESULTS: We propose a positive-unlabeled learning algorithm to address this problem. The key idea is to distinguish between epitope-likely residues and reliable negative residues in unlabeled data. The method has two steps: (1) identify reliable negative residues using a weighted SVM with a high recall; and (2) construct a classification model on the positive residues and the reliable negative residues. Complex-based 10-fold cross-validation was conducted to show that this method outperforms those commonly used predictors DiscoTope 2.0, ElliPro and SEPPA 2.0 in every aspect. We conducted four case studies, in which the approach was tested on antigens of West Nile virus, dihydrofolate reductase, beta-lactamase, and two Ebola antigens whose epitopes are currently unknown. All the results were assessed on a newly-established data set of antigen structures not bound by antibodies, instead of on antibody-bound antigen structures. These bound structures may contain unfair binding information such as bound-state B-factors and protrusion index which could exaggerate the epitope prediction performance. Source codes are available on request. BioMed Central 2015-12-09 /pmc/articles/PMC4682424/ /pubmed/26681157 http://dx.doi.org/10.1186/1471-2105-16-S18-S12 Text en Copyright © 2015 Ren et al. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Ren, Jing Liu, Qian Ellis, John Li, Jinyan Positive-unlabeled learning for the prediction of conformational B-cell epitopes |
title | Positive-unlabeled learning for the prediction of conformational B-cell epitopes |
title_full | Positive-unlabeled learning for the prediction of conformational B-cell epitopes |
title_fullStr | Positive-unlabeled learning for the prediction of conformational B-cell epitopes |
title_full_unstemmed | Positive-unlabeled learning for the prediction of conformational B-cell epitopes |
title_short | Positive-unlabeled learning for the prediction of conformational B-cell epitopes |
title_sort | positive-unlabeled learning for the prediction of conformational b-cell epitopes |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4682424/ https://www.ncbi.nlm.nih.gov/pubmed/26681157 http://dx.doi.org/10.1186/1471-2105-16-S18-S12 |
work_keys_str_mv | AT renjing positiveunlabeledlearningforthepredictionofconformationalbcellepitopes AT liuqian positiveunlabeledlearningforthepredictionofconformationalbcellepitopes AT ellisjohn positiveunlabeledlearningforthepredictionofconformationalbcellepitopes AT lijinyan positiveunlabeledlearningforthepredictionofconformationalbcellepitopes |