Cargando…

PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction

MOTIVATION: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ditz, Jonas C, Wistuba-Hamprecht, Jacqueline, Maier, Timo, Fendel, Rolf, Pfeifer, Nico, Reuter, Bernhard
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Biomedical Informatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311333/ https://www.ncbi.nlm.nih.gov/pubmed/37387133 http://dx.doi.org/10.1093/bioinformatics/btad206

_version_	1785066721268203520
author	Ditz, Jonas C Wistuba-Hamprecht, Jacqueline Maier, Timo Fendel, Rolf Pfeifer, Nico Reuter, Bernhard
author_facet	Ditz, Jonas C Wistuba-Hamprecht, Jacqueline Maier, Timo Fendel, Rolf Pfeifer, Nico Reuter, Bernhard
author_sort	Ditz, Jonas C
collection	PubMed
description	MOTIVATION: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparum protein antigen candidates exists. The parasite P.falciparum causes the infectious disease malaria. Thus, identifying potential antigens is of utmost importance for the development of antimalarial drugs and vaccines. Since exploring antigen candidates experimentally is an expensive and time-consuming process, applying machine learning methods to support this process has the potential to accelerate the development of drugs and vaccines, which are needed for fighting and controlling malaria. RESULTS: We developed PlasmoFAB, a curated benchmark that can be used to train machine learning methods for the exploration of P.falciparum protein antigen candidates. We combined an extensive literature search with domain expertise to create high-quality labels for P.falciparum specific proteins that distinguish between antigen candidates and intracellular proteins. Additionally, we used our benchmark to compare different well-known prediction models and available protein localization prediction services on the task of identifying protein antigen candidates. We show that available general-purpose services are unable to provide sufficient performance on identifying protein antigen candidates and are outperformed by our models that were trained on this tailored data. AVAILABILITY AND IMPLEMENTATION: PlasmoFAB is publicly available on Zenodo with DOI 10.5281/zenodo.7433087. Furthermore, all scripts that were used in the creation of PlasmoFAB and the training and evaluation of machine learning models are open source and publicly available on GitHub here: https://github.com/msmdev/PlasmoFAB.
format	Online Article Text
id	pubmed-10311333
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-103113332023-07-01 PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction Ditz, Jonas C Wistuba-Hamprecht, Jacqueline Maier, Timo Fendel, Rolf Pfeifer, Nico Reuter, Bernhard Bioinformatics Biomedical Informatics MOTIVATION: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparum protein antigen candidates exists. The parasite P.falciparum causes the infectious disease malaria. Thus, identifying potential antigens is of utmost importance for the development of antimalarial drugs and vaccines. Since exploring antigen candidates experimentally is an expensive and time-consuming process, applying machine learning methods to support this process has the potential to accelerate the development of drugs and vaccines, which are needed for fighting and controlling malaria. RESULTS: We developed PlasmoFAB, a curated benchmark that can be used to train machine learning methods for the exploration of P.falciparum protein antigen candidates. We combined an extensive literature search with domain expertise to create high-quality labels for P.falciparum specific proteins that distinguish between antigen candidates and intracellular proteins. Additionally, we used our benchmark to compare different well-known prediction models and available protein localization prediction services on the task of identifying protein antigen candidates. We show that available general-purpose services are unable to provide sufficient performance on identifying protein antigen candidates and are outperformed by our models that were trained on this tailored data. AVAILABILITY AND IMPLEMENTATION: PlasmoFAB is publicly available on Zenodo with DOI 10.5281/zenodo.7433087. Furthermore, all scripts that were used in the creation of PlasmoFAB and the training and evaluation of machine learning models are open source and publicly available on GitHub here: https://github.com/msmdev/PlasmoFAB. Oxford University Press 2023-06-30 /pmc/articles/PMC10311333/ /pubmed/37387133 http://dx.doi.org/10.1093/bioinformatics/btad206 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Biomedical Informatics Ditz, Jonas C Wistuba-Hamprecht, Jacqueline Maier, Timo Fendel, Rolf Pfeifer, Nico Reuter, Bernhard PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title	PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title_full	PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title_fullStr	PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title_full_unstemmed	PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title_short	PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction
title_sort	plasmofab: a benchmark to foster machine learning for plasmodium falciparum protein antigen candidate prediction
topic	Biomedical Informatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311333/ https://www.ncbi.nlm.nih.gov/pubmed/37387133 http://dx.doi.org/10.1093/bioinformatics/btad206
work_keys_str_mv	AT ditzjonasc plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction AT wistubahamprechtjacqueline plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction AT maiertimo plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction AT fendelrolf plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction AT pfeifernico plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction AT reuterbernhard plasmofababenchmarktofostermachinelearningforplasmodiumfalciparumproteinantigencandidateprediction

PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction

Ejemplares similares