Cargando…
PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
MOTIVATION: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparu...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311333/ https://www.ncbi.nlm.nih.gov/pubmed/37387133 http://dx.doi.org/10.1093/bioinformatics/btad206 |
_version_ | 1785066721268203520 |
---|---|
author | Ditz, Jonas C Wistuba-Hamprecht, Jacqueline Maier, Timo Fendel, Rolf Pfeifer, Nico Reuter, Bernhard |
author_facet | Ditz, Jonas C Wistuba-Hamprecht, Jacqueline Maier, Timo Fendel, Rolf Pfeifer, Nico Reuter, Bernhard |
author_sort | Ditz, Jonas C |
collection | PubMed |
description | MOTIVATION: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparum protein antigen candidates exists. The parasite P.falciparum causes the infectious disease malaria. Thus, identifying potential antigens is of utmost importance for the development of antimalarial drugs and vaccines. Since exploring antigen candidates experimentally is an expensive and time-consuming process, applying machine learning methods to support this process has the potential to accelerate the development of drugs and vaccines, which are needed for fighting and controlling malaria. RESULTS: We developed PlasmoFAB, a curated benchmark that can be used to train machine learning methods for the exploration of P.falciparum protein antigen candidates. We combined an extensive literature search with domain expertise to create high-quality labels for P.falciparum specific proteins that distinguish between antigen candidates and intracellular proteins. Additionally, we used our benchmark to compare different well-known prediction models and available protein localization prediction services on the task of identifying protein antigen candidates. We show that available general-purpose services are unable to provide sufficient performance on identifying protein antigen candidates and are outperformed by our models that were trained on this tailored data. AVAILABILITY AND IMPLEMENTATION: PlasmoFAB is publicly available on Zenodo with DOI 10.5281/zenodo.7433087. Furthermore, all scripts that were used in the creation of PlasmoFAB and the training and evaluation of machine learning models are open source and publicly available on GitHub here: https://github.com/msmdev/PlasmoFAB. |
format | Online Article Text |
id | pubmed-10311333 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-103113332023-07-01 PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction Ditz, Jonas C Wistuba-Hamprecht, Jacqueline Maier, Timo Fendel, Rolf Pfeifer, Nico Reuter, Bernhard Bioinformatics Biomedical Informatics MOTIVATION: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparum protein antigen candidates exists. The parasite P.falciparum causes the infectious disease malaria. Thus, identifying potential antigens is of utmost importance for the development of antimalarial drugs and vaccines. Since exploring antigen candidates experimentally is an expensive and time-consuming process, applying machine learning methods to support this process has the potential to accelerate the development of drugs and vaccines, which are needed for fighting and controlling malaria. RESULTS: We developed PlasmoFAB, a curated benchmark that can be used to train machine learning methods for the exploration of P.falciparum protein antigen candidates. We combined an extensive literature search with domain expertise to create high-quality labels for P.falciparum specific proteins that distinguish between antigen candidates and intracellular proteins. Additionally, we used our benchmark to compare different well-known prediction models and available protein localization prediction services on the task of identifying protein antigen candidates. We show that available general-purpose services are unable to provide sufficient performance on identifying protein antigen candidates and are outperformed by our models that were trained on this tailored data. AVAILABILITY AND IMPLEMENTATION: PlasmoFAB is publicly available on Zenodo with DOI 10.5281/zenodo.7433087. Furthermore, all scripts that were used in the creation of PlasmoFAB and the training and evaluation of machine learning models are open source and publicly available on GitHub here: https://github.com/msmdev/PlasmoFAB. Oxford University Press 2023-06-30 /pmc/articles/PMC10311333/ /pubmed/37387133 http://dx.doi.org/10.1093/bioinformatics/btad206 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Biomedical Informatics Ditz, Jonas C Wistuba-Hamprecht, Jacqueline Maier, Timo Fendel, Rolf Pfeifer, Nico Reuter, Bernhard PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction |
title | PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction |
title_full | PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction |
title_fullStr | PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction |
title_full_unstemmed | PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction |
title_short | PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction |
title_sort | plasmofab: a benchmark to foster machine learning for plasmodium falciparum protein antigen candidate prediction |
topic | Biomedical Informatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311333/ https://www.ncbi.nlm.nih.gov/pubmed/37387133 http://dx.doi.org/10.1093/bioinformatics/btad206 |
work_keys_str_mv | AT ditzjonasc plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction AT wistubahamprechtjacqueline plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction AT maiertimo plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction AT fendelrolf plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction AT pfeifernico plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction AT reuterbernhard plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction |