Cargando…

Enhancement of conformational B-cell epitope prediction using CluSMOTE

BACKGROUND: A conformational B-cell epitope is one of the main components of vaccine design. It contains separate segments in its sequence, which are spatially close in the antigen chain. The availability of Ag-Ab complex data on the Protein Data Bank allows for the development predictive methods. S...

Descripción completa

Detalles Bibliográficos
Autores principales:	Solihah, Binti, Azhari, Azhari, Musdholifah, Aina
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2020
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924438/ https://www.ncbi.nlm.nih.gov/pubmed/33816926 http://dx.doi.org/10.7717/peerj-cs.275

_version_	1783659089650253824
author	Solihah, Binti Azhari, Azhari Musdholifah, Aina
author_facet	Solihah, Binti Azhari, Azhari Musdholifah, Aina
author_sort	Solihah, Binti
collection	PubMed
description	BACKGROUND: A conformational B-cell epitope is one of the main components of vaccine design. It contains separate segments in its sequence, which are spatially close in the antigen chain. The availability of Ag-Ab complex data on the Protein Data Bank allows for the development predictive methods. Several epitope prediction models also have been developed, including learning-based methods. However, the performance of the model is still not optimum. The main problem in learning-based prediction models is class imbalance. METHODS: This study proposes CluSMOTE, which is a combination of a cluster-based undersampling method and Synthetic Minority Oversampling Technique. The approach is used to generate other sample data to ensure that the dataset of the conformational epitope is balanced. The Hierarchical DBSCAN algorithm is performed to identify the cluster in the majority class. Some of the randomly selected data is taken from each cluster, considering the oversampling degree, and combined with the minority class data. The balance data is utilized as the training dataset to develop a conformational epitope prediction. Furthermore, two binary classification methods, Support Vector Machine and Decision Tree, are separately used to develop model prediction and to evaluate the performance of CluSMOTE in predicting conformational B-cell epitope. The experiment is focused on determining the best parameter for optimal CluSMOTE. Two independent datasets are used to compare the proposed prediction model with state of the art methods. The first and the second datasets represent the general protein and the glycoprotein antigens respectively. RESULT: The experimental result shows that CluSMOTE Decision Tree outperformed the Support Vector Machine in terms of AUC and Gmean as performance measurements. The mean AUC of CluSMOTE Decision Tree in the Kringelum and the SEPPA 3 test sets are 0.83 and 0.766, respectively. This shows that CluSMOTE Decision Tree is better than other methods in the general protein antigen, though comparable with SEPPA 3 in the glycoprotein antigen.
format	Online Article Text
id	pubmed-7924438
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-79244382021-04-02 Enhancement of conformational B-cell epitope prediction using CluSMOTE Solihah, Binti Azhari, Azhari Musdholifah, Aina PeerJ Comput Sci Bioinformatics BACKGROUND: A conformational B-cell epitope is one of the main components of vaccine design. It contains separate segments in its sequence, which are spatially close in the antigen chain. The availability of Ag-Ab complex data on the Protein Data Bank allows for the development predictive methods. Several epitope prediction models also have been developed, including learning-based methods. However, the performance of the model is still not optimum. The main problem in learning-based prediction models is class imbalance. METHODS: This study proposes CluSMOTE, which is a combination of a cluster-based undersampling method and Synthetic Minority Oversampling Technique. The approach is used to generate other sample data to ensure that the dataset of the conformational epitope is balanced. The Hierarchical DBSCAN algorithm is performed to identify the cluster in the majority class. Some of the randomly selected data is taken from each cluster, considering the oversampling degree, and combined with the minority class data. The balance data is utilized as the training dataset to develop a conformational epitope prediction. Furthermore, two binary classification methods, Support Vector Machine and Decision Tree, are separately used to develop model prediction and to evaluate the performance of CluSMOTE in predicting conformational B-cell epitope. The experiment is focused on determining the best parameter for optimal CluSMOTE. Two independent datasets are used to compare the proposed prediction model with state of the art methods. The first and the second datasets represent the general protein and the glycoprotein antigens respectively. RESULT: The experimental result shows that CluSMOTE Decision Tree outperformed the Support Vector Machine in terms of AUC and Gmean as performance measurements. The mean AUC of CluSMOTE Decision Tree in the Kringelum and the SEPPA 3 test sets are 0.83 and 0.766, respectively. This shows that CluSMOTE Decision Tree is better than other methods in the general protein antigen, though comparable with SEPPA 3 in the glycoprotein antigen. PeerJ Inc. 2020-06-01 /pmc/articles/PMC7924438/ /pubmed/33816926 http://dx.doi.org/10.7717/peerj-cs.275 Text en ©2020 Solihah et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Solihah, Binti Azhari, Azhari Musdholifah, Aina Enhancement of conformational B-cell epitope prediction using CluSMOTE
title	Enhancement of conformational B-cell epitope prediction using CluSMOTE
title_full	Enhancement of conformational B-cell epitope prediction using CluSMOTE
title_fullStr	Enhancement of conformational B-cell epitope prediction using CluSMOTE
title_full_unstemmed	Enhancement of conformational B-cell epitope prediction using CluSMOTE
title_short	Enhancement of conformational B-cell epitope prediction using CluSMOTE
title_sort	enhancement of conformational b-cell epitope prediction using clusmote
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924438/ https://www.ncbi.nlm.nih.gov/pubmed/33816926 http://dx.doi.org/10.7717/peerj-cs.275
work_keys_str_mv	AT solihahbinti enhancementofconformationalbcellepitopepredictionusingclusmote AT azhariazhari enhancementofconformationalbcellepitopepredictionusingclusmote AT musdholifahaina enhancementofconformationalbcellepitopepredictionusingclusmote

Enhancement of conformational B-cell epitope prediction using CluSMOTE

Ejemplares similares