Cargando…

An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction

Previous work has shown that proteins that have the potential to be vaccine candidates can be predicted from features derived from their amino acid sequences. In this work, we make an empirical comparison across various machine learning classifiers on this sequence-based inference problem. Using sys...

Descripción completa

Detalles Bibliográficos
Autores principales: Heinson, Ashley I., Ewing, Rob M., Holloway, John W., Woelk, Christopher H., Niranjan, Mahesan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6910663/
https://www.ncbi.nlm.nih.gov/pubmed/31834914
http://dx.doi.org/10.1371/journal.pone.0226256
_version_ 1783479131017576448
author Heinson, Ashley I.
Ewing, Rob M.
Holloway, John W.
Woelk, Christopher H.
Niranjan, Mahesan
author_facet Heinson, Ashley I.
Ewing, Rob M.
Holloway, John W.
Woelk, Christopher H.
Niranjan, Mahesan
author_sort Heinson, Ashley I.
collection PubMed
description Previous work has shown that proteins that have the potential to be vaccine candidates can be predicted from features derived from their amino acid sequences. In this work, we make an empirical comparison across various machine learning classifiers on this sequence-based inference problem. Using systematic cross validation on a dataset of 200 known vaccine candidates and 200 negative examples, with a set of 525 features derived from the AA sequences and feature selection applied through a greedy backward elimination approach, we show that simple classification algorithms often perform as well as more complex support vector kernel machines. The work also includes a novel cross validation applied across bacterial species, i.e. the validation proteins all come from a specific species of bacterium not represented in the training set. We termed this type of validation Leave One Bacteria Out Validation (LOBOV).
format Online
Article
Text
id pubmed-6910663
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-69106632019-12-27 An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction Heinson, Ashley I. Ewing, Rob M. Holloway, John W. Woelk, Christopher H. Niranjan, Mahesan PLoS One Research Article Previous work has shown that proteins that have the potential to be vaccine candidates can be predicted from features derived from their amino acid sequences. In this work, we make an empirical comparison across various machine learning classifiers on this sequence-based inference problem. Using systematic cross validation on a dataset of 200 known vaccine candidates and 200 negative examples, with a set of 525 features derived from the AA sequences and feature selection applied through a greedy backward elimination approach, we show that simple classification algorithms often perform as well as more complex support vector kernel machines. The work also includes a novel cross validation applied across bacterial species, i.e. the validation proteins all come from a specific species of bacterium not represented in the training set. We termed this type of validation Leave One Bacteria Out Validation (LOBOV). Public Library of Science 2019-12-13 /pmc/articles/PMC6910663/ /pubmed/31834914 http://dx.doi.org/10.1371/journal.pone.0226256 Text en © 2019 Heinson et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Heinson, Ashley I.
Ewing, Rob M.
Holloway, John W.
Woelk, Christopher H.
Niranjan, Mahesan
An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
title An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
title_full An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
title_fullStr An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
title_full_unstemmed An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
title_short An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
title_sort evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6910663/
https://www.ncbi.nlm.nih.gov/pubmed/31834914
http://dx.doi.org/10.1371/journal.pone.0226256
work_keys_str_mv AT heinsonashleyi anevaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction
AT ewingrobm anevaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction
AT hollowayjohnw anevaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction
AT woelkchristopherh anevaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction
AT niranjanmahesan anevaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction
AT heinsonashleyi evaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction
AT ewingrobm evaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction
AT hollowayjohnw evaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction
AT woelkchristopherh evaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction
AT niranjanmahesan evaluationofdifferentclassificationalgorithmsforproteinsequencebasedreversevaccinologyprediction