Cargando…
An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
Previous work has shown that proteins that have the potential to be vaccine candidates can be predicted from features derived from their amino acid sequences. In this work, we make an empirical comparison across various machine learning classifiers on this sequence-based inference problem. Using sys...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6910663/ https://www.ncbi.nlm.nih.gov/pubmed/31834914 http://dx.doi.org/10.1371/journal.pone.0226256 |
_version_ | 1783479131017576448 |
---|---|
author | Heinson, Ashley I. Ewing, Rob M. Holloway, John W. Woelk, Christopher H. Niranjan, Mahesan |
author_facet | Heinson, Ashley I. Ewing, Rob M. Holloway, John W. Woelk, Christopher H. Niranjan, Mahesan |
author_sort | Heinson, Ashley I. |
collection | PubMed |
description | Previous work has shown that proteins that have the potential to be vaccine candidates can be predicted from features derived from their amino acid sequences. In this work, we make an empirical comparison across various machine learning classifiers on this sequence-based inference problem. Using systematic cross validation on a dataset of 200 known vaccine candidates and 200 negative examples, with a set of 525 features derived from the AA sequences and feature selection applied through a greedy backward elimination approach, we show that simple classification algorithms often perform as well as more complex support vector kernel machines. The work also includes a novel cross validation applied across bacterial species, i.e. the validation proteins all come from a specific species of bacterium not represented in the training set. We termed this type of validation Leave One Bacteria Out Validation (LOBOV). |
format | Online Article Text |
id | pubmed-6910663 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-69106632019-12-27 An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction Heinson, Ashley I. Ewing, Rob M. Holloway, John W. Woelk, Christopher H. Niranjan, Mahesan PLoS One Research Article Previous work has shown that proteins that have the potential to be vaccine candidates can be predicted from features derived from their amino acid sequences. In this work, we make an empirical comparison across various machine learning classifiers on this sequence-based inference problem. Using systematic cross validation on a dataset of 200 known vaccine candidates and 200 negative examples, with a set of 525 features derived from the AA sequences and feature selection applied through a greedy backward elimination approach, we show that simple classification algorithms often perform as well as more complex support vector kernel machines. The work also includes a novel cross validation applied across bacterial species, i.e. the validation proteins all come from a specific species of bacterium not represented in the training set. We termed this type of validation Leave One Bacteria Out Validation (LOBOV). Public Library of Science 2019-12-13 /pmc/articles/PMC6910663/ /pubmed/31834914 http://dx.doi.org/10.1371/journal.pone.0226256 Text en © 2019 Heinson et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Heinson, Ashley I. Ewing, Rob M. Holloway, John W. Woelk, Christopher H. Niranjan, Mahesan An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction |
title | An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction |
title_full | An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction |
title_fullStr | An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction |
title_full_unstemmed | An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction |
title_short | An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction |
title_sort | evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6910663/ https://www.ncbi.nlm.nih.gov/pubmed/31834914 http://dx.doi.org/10.1371/journal.pone.0226256 |
work_keys_str_mv | AT heinsonashleyi anevaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction AT ewingrobm anevaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction AT hollowayjohnw anevaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction AT woelkchristopherh anevaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction AT niranjanmahesan anevaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction AT heinsonashleyi evaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction AT ewingrobm evaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction AT hollowayjohnw evaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction AT woelkchristopherh evaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction AT niranjanmahesan evaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction |