Cargando…

Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques

OBJECTIVE: Tracking global health funding is a crucial but time consuming and labor-intensive process. This study aimed to develop a framework to automate the tracking of global health spending using natural language processing (NLP) and machine learning (ML) algorithms. We used the global common go...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dixit, Siddharth, Mao, Wenhui, McDade, Kaci Kennedy, Schäferhoff, Marco, Ogbuoji, Osondu, Yamey, Gavin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Public Health
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9712779/ https://www.ncbi.nlm.nih.gov/pubmed/36466519 http://dx.doi.org/10.3389/fpubh.2022.1031147

_version_	1784841863688093696
author	Dixit, Siddharth Mao, Wenhui McDade, Kaci Kennedy Schäferhoff, Marco Ogbuoji, Osondu Yamey, Gavin
author_facet	Dixit, Siddharth Mao, Wenhui McDade, Kaci Kennedy Schäferhoff, Marco Ogbuoji, Osondu Yamey, Gavin
author_sort	Dixit, Siddharth
collection	PubMed
description	OBJECTIVE: Tracking global health funding is a crucial but time consuming and labor-intensive process. This study aimed to develop a framework to automate the tracking of global health spending using natural language processing (NLP) and machine learning (ML) algorithms. We used the global common goods for health (CGH) categories developed by Schäferhoff et al. to design and evaluate ML models. METHODS: We used data curated by Schäferhoff et al., which tracked the official development assistance (ODA) disbursements to global CGH for 2013, 2015, and 2017, for training and validating the ML models. To process raw text, we implemented different NLP techniques, such as removing stop words, lemmatization, and creation of synthetic text, to balance the dataset. We used four supervised learning ML algorithms—random forest (RF), XGBOOST, support vector machine (SVM), and multinomial naïve Bayes (MNB) (see Glossary)—to train and test the pre-coded dataset, and applied the best model on dataset that hasn't been manually coded to predict the financing for CGH in 2019. RESULTS: After we trained the machine on the training dataset (n = 10,534), the weighted average F1-scores (a measure of a ML model's performance) on the testing dataset (n = 2,634) ranked 0.79–0.83 among four models, and the RF model had the best performance (F1-score = 0.83). The predicted total donor support for CGH projects by the RF model was $2.24 billion across 3 years, which was very close to the finding of $2.25 billion derived from coding and classification by humans. By applying the trained RF model on the 2019 dataset, we predicted that the total funding for global CGH was about $2.7 billion for 730 CGH projects. CONCLUSION: We have demonstrated that NLP and ML can be a feasible and efficient way to classify health projects into different global CGH categories, and thus track health funding for CGH routinely using data from publicly available databases.
format	Online Article Text
id	pubmed-9712779
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-97127792022-12-02 Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques Dixit, Siddharth Mao, Wenhui McDade, Kaci Kennedy Schäferhoff, Marco Ogbuoji, Osondu Yamey, Gavin Front Public Health Public Health OBJECTIVE: Tracking global health funding is a crucial but time consuming and labor-intensive process. This study aimed to develop a framework to automate the tracking of global health spending using natural language processing (NLP) and machine learning (ML) algorithms. We used the global common goods for health (CGH) categories developed by Schäferhoff et al. to design and evaluate ML models. METHODS: We used data curated by Schäferhoff et al., which tracked the official development assistance (ODA) disbursements to global CGH for 2013, 2015, and 2017, for training and validating the ML models. To process raw text, we implemented different NLP techniques, such as removing stop words, lemmatization, and creation of synthetic text, to balance the dataset. We used four supervised learning ML algorithms—random forest (RF), XGBOOST, support vector machine (SVM), and multinomial naïve Bayes (MNB) (see Glossary)—to train and test the pre-coded dataset, and applied the best model on dataset that hasn't been manually coded to predict the financing for CGH in 2019. RESULTS: After we trained the machine on the training dataset (n = 10,534), the weighted average F1-scores (a measure of a ML model's performance) on the testing dataset (n = 2,634) ranked 0.79–0.83 among four models, and the RF model had the best performance (F1-score = 0.83). The predicted total donor support for CGH projects by the RF model was $2.24 billion across 3 years, which was very close to the finding of $2.25 billion derived from coding and classification by humans. By applying the trained RF model on the 2019 dataset, we predicted that the total funding for global CGH was about $2.7 billion for 730 CGH projects. CONCLUSION: We have demonstrated that NLP and ML can be a feasible and efficient way to classify health projects into different global CGH categories, and thus track health funding for CGH routinely using data from publicly available databases. Frontiers Media S.A. 2022-11-17 /pmc/articles/PMC9712779/ /pubmed/36466519 http://dx.doi.org/10.3389/fpubh.2022.1031147 Text en Copyright © 2022 Dixit, Mao, McDade, Schäferhoff, Ogbuoji and Yamey. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Public Health Dixit, Siddharth Mao, Wenhui McDade, Kaci Kennedy Schäferhoff, Marco Ogbuoji, Osondu Yamey, Gavin Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques
title	Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques
title_full	Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques
title_fullStr	Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques
title_full_unstemmed	Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques
title_short	Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques
title_sort	tracking financing for global common goods for health: a machine learning approach using natural language processing techniques
topic	Public Health
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9712779/ https://www.ncbi.nlm.nih.gov/pubmed/36466519 http://dx.doi.org/10.3389/fpubh.2022.1031147
work_keys_str_mv	AT dixitsiddharth trackingfinancingforglobalcommongoodsforhealthamachinelearningapproachusingnaturallanguageprocessingtechniques AT maowenhui trackingfinancingforglobalcommongoodsforhealthamachinelearningapproachusingnaturallanguageprocessingtechniques AT mcdadekacikennedy trackingfinancingforglobalcommongoodsforhealthamachinelearningapproachusingnaturallanguageprocessingtechniques AT schaferhoffmarco trackingfinancingforglobalcommongoodsforhealthamachinelearningapproachusingnaturallanguageprocessingtechniques AT ogbuojiosondu trackingfinancingforglobalcommongoodsforhealthamachinelearningapproachusingnaturallanguageprocessingtechniques AT yameygavin trackingfinancingforglobalcommongoodsforhealthamachinelearningapproachusingnaturallanguageprocessingtechniques

Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques

Ejemplares similares