Cargando…

PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction

MOTIVATION: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparu...

Descripción completa

Detalles Bibliográficos
Autores principales: Ditz, Jonas C, Wistuba-Hamprecht, Jacqueline, Maier, Timo, Fendel, Rolf, Pfeifer, Nico, Reuter, Bernhard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311333/
https://www.ncbi.nlm.nih.gov/pubmed/37387133
http://dx.doi.org/10.1093/bioinformatics/btad206
_version_ 1785066721268203520
author Ditz, Jonas C
Wistuba-Hamprecht, Jacqueline
Maier, Timo
Fendel, Rolf
Pfeifer, Nico
Reuter, Bernhard
author_facet Ditz, Jonas C
Wistuba-Hamprecht, Jacqueline
Maier, Timo
Fendel, Rolf
Pfeifer, Nico
Reuter, Bernhard
author_sort Ditz, Jonas C
collection PubMed
description MOTIVATION: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparum protein antigen candidates exists. The parasite P.falciparum causes the infectious disease malaria. Thus, identifying potential antigens is of utmost importance for the development of antimalarial drugs and vaccines. Since exploring antigen candidates experimentally is an expensive and time-consuming process, applying machine learning methods to support this process has the potential to accelerate the development of drugs and vaccines, which are needed for fighting and controlling malaria. RESULTS: We developed PlasmoFAB, a curated benchmark that can be used to train machine learning methods for the exploration of P.falciparum protein antigen candidates. We combined an extensive literature search with domain expertise to create high-quality labels for P.falciparum specific proteins that distinguish between antigen candidates and intracellular proteins. Additionally, we used our benchmark to compare different well-known prediction models and available protein localization prediction services on the task of identifying protein antigen candidates. We show that available general-purpose services are unable to provide sufficient performance on identifying protein antigen candidates and are outperformed by our models that were trained on this tailored data. AVAILABILITY AND IMPLEMENTATION: PlasmoFAB is publicly available on Zenodo with DOI 10.5281/zenodo.7433087. Furthermore, all scripts that were used in the creation of PlasmoFAB and the training and evaluation of machine learning models are open source and publicly available on GitHub here: https://github.com/msmdev/PlasmoFAB.
format Online
Article
Text
id pubmed-10311333
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103113332023-07-01 PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction Ditz, Jonas C Wistuba-Hamprecht, Jacqueline Maier, Timo Fendel, Rolf Pfeifer, Nico Reuter, Bernhard Bioinformatics Biomedical Informatics MOTIVATION: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparum protein antigen candidates exists. The parasite P.falciparum causes the infectious disease malaria. Thus, identifying potential antigens is of utmost importance for the development of antimalarial drugs and vaccines. Since exploring antigen candidates experimentally is an expensive and time-consuming process, applying machine learning methods to support this process has the potential to accelerate the development of drugs and vaccines, which are needed for fighting and controlling malaria. RESULTS: We developed PlasmoFAB, a curated benchmark that can be used to train machine learning methods for the exploration of P.falciparum protein antigen candidates. We combined an extensive literature search with domain expertise to create high-quality labels for P.falciparum specific proteins that distinguish between antigen candidates and intracellular proteins. Additionally, we used our benchmark to compare different well-known prediction models and available protein localization prediction services on the task of identifying protein antigen candidates. We show that available general-purpose services are unable to provide sufficient performance on identifying protein antigen candidates and are outperformed by our models that were trained on this tailored data. AVAILABILITY AND IMPLEMENTATION: PlasmoFAB is publicly available on Zenodo with DOI 10.5281/zenodo.7433087. Furthermore, all scripts that were used in the creation of PlasmoFAB and the training and evaluation of machine learning models are open source and publicly available on GitHub here: https://github.com/msmdev/PlasmoFAB. Oxford University Press 2023-06-30 /pmc/articles/PMC10311333/ /pubmed/37387133 http://dx.doi.org/10.1093/bioinformatics/btad206 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Biomedical Informatics
Ditz, Jonas C
Wistuba-Hamprecht, Jacqueline
Maier, Timo
Fendel, Rolf
Pfeifer, Nico
Reuter, Bernhard
PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title_full PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title_fullStr PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title_full_unstemmed PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title_short PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title_sort plasmofab: a benchmark to foster machine learning for plasmodium falciparum protein antigen candidate prediction
topic Biomedical Informatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311333/
https://www.ncbi.nlm.nih.gov/pubmed/37387133
http://dx.doi.org/10.1093/bioinformatics/btad206
work_keys_str_mv AT ditzjonasc plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction
AT wistubahamprechtjacqueline plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction
AT maiertimo plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction
AT fendelrolf plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction
AT pfeifernico plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction
AT reuterbernhard plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction