Cargando…

Improving peptide-MHC class I binding prediction for unbalanced datasets

BACKGROUND: Establishment of peptide binding to Major Histocompatibility Complex class I (MHCI) is a crucial step in the development of subunit vaccines and prediction of such binding could greatly reduce costs and accelerate the experimental process of identifying immunogenic peptides. Many methods...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sales, Ana Paula, Tomaras, Georgia D, Kepler, Thomas B
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586639/ https://www.ncbi.nlm.nih.gov/pubmed/18803836 http://dx.doi.org/10.1186/1471-2105-9-385

_version_	1782160901783158784
author	Sales, Ana Paula Tomaras, Georgia D Kepler, Thomas B
author_facet	Sales, Ana Paula Tomaras, Georgia D Kepler, Thomas B
author_sort	Sales, Ana Paula
collection	PubMed
description	BACKGROUND: Establishment of peptide binding to Major Histocompatibility Complex class I (MHCI) is a crucial step in the development of subunit vaccines and prediction of such binding could greatly reduce costs and accelerate the experimental process of identifying immunogenic peptides. Many methods have been applied to the prediction of peptide-MHCI binding, with some achieving outstanding performance. Because of the experimental methods used to measure binding or affinity between peptides and MHCI molecules, however, available datasets are enriched for nonbinders, and thus highly unbalanced. Although there is no consensus on the ideal class distribution for training sets, extremely unbalanced datasets can be detrimental to the performance of prediction algorithms. RESULTS: We have developed a decision-theoretic framework to construct cost-sensitive trees to predict peptide-MHCI binding and have used them to 1) Assess the impact of the training data's class distribution on classifier accuracy, and 2) Compare resampling and cost-sensitive methods as approaches to compensate for training data imbalance. Our results confirm that highly unbalanced training sets can reduce the accuracy of classifier predictions and show that, in the peptide-MHCI binding context, resampling methods do not improve the classifier performance. In contrast, cost-sensitive methods significantly improve accuracy of decision trees. Finally, we propose the use of a training scheme that, when the training set is enriched for nonbinders, consistently improves the overall classifier accuracy compared to cost-insensitive classifiers and, in particular, increases the sensitivity of the classifiers. This method minimizes the expected classification cost for large datasets. CONCLUSION: Our method consistently improves the performance of decision trees in predicting peptide-MHC class I binding by using cost-balancing techniques to compensate for the imbalance in the training dataset.
format	Text
id	pubmed-2586639
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-25866392008-11-25 Improving peptide-MHC class I binding prediction for unbalanced datasets Sales, Ana Paula Tomaras, Georgia D Kepler, Thomas B BMC Bioinformatics Research Article BACKGROUND: Establishment of peptide binding to Major Histocompatibility Complex class I (MHCI) is a crucial step in the development of subunit vaccines and prediction of such binding could greatly reduce costs and accelerate the experimental process of identifying immunogenic peptides. Many methods have been applied to the prediction of peptide-MHCI binding, with some achieving outstanding performance. Because of the experimental methods used to measure binding or affinity between peptides and MHCI molecules, however, available datasets are enriched for nonbinders, and thus highly unbalanced. Although there is no consensus on the ideal class distribution for training sets, extremely unbalanced datasets can be detrimental to the performance of prediction algorithms. RESULTS: We have developed a decision-theoretic framework to construct cost-sensitive trees to predict peptide-MHCI binding and have used them to 1) Assess the impact of the training data's class distribution on classifier accuracy, and 2) Compare resampling and cost-sensitive methods as approaches to compensate for training data imbalance. Our results confirm that highly unbalanced training sets can reduce the accuracy of classifier predictions and show that, in the peptide-MHCI binding context, resampling methods do not improve the classifier performance. In contrast, cost-sensitive methods significantly improve accuracy of decision trees. Finally, we propose the use of a training scheme that, when the training set is enriched for nonbinders, consistently improves the overall classifier accuracy compared to cost-insensitive classifiers and, in particular, increases the sensitivity of the classifiers. This method minimizes the expected classification cost for large datasets. CONCLUSION: Our method consistently improves the performance of decision trees in predicting peptide-MHC class I binding by using cost-balancing techniques to compensate for the imbalance in the training dataset. BioMed Central 2008-09-19 /pmc/articles/PMC2586639/ /pubmed/18803836 http://dx.doi.org/10.1186/1471-2105-9-385 Text en Copyright © 2008 Sales et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Sales, Ana Paula Tomaras, Georgia D Kepler, Thomas B Improving peptide-MHC class I binding prediction for unbalanced datasets
title	Improving peptide-MHC class I binding prediction for unbalanced datasets
title_full	Improving peptide-MHC class I binding prediction for unbalanced datasets
title_fullStr	Improving peptide-MHC class I binding prediction for unbalanced datasets
title_full_unstemmed	Improving peptide-MHC class I binding prediction for unbalanced datasets
title_short	Improving peptide-MHC class I binding prediction for unbalanced datasets
title_sort	improving peptide-mhc class i binding prediction for unbalanced datasets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586639/ https://www.ncbi.nlm.nih.gov/pubmed/18803836 http://dx.doi.org/10.1186/1471-2105-9-385
work_keys_str_mv	AT salesanapaula improvingpeptidemhcclassibindingpredictionforunbalanceddatasets AT tomarasgeorgiad improvingpeptidemhcclassibindingpredictionforunbalanceddatasets AT keplerthomasb improvingpeptidemhcclassibindingpredictionforunbalanceddatasets

Improving peptide-MHC class I binding prediction for unbalanced datasets

Ejemplares similares