Cargando…

Estimating disease prevalence from drug utilization data using the Random Forest algorithm

BACKGROUND: Aggregated claims data on medication are often used as a proxy for the prevalence of diseases, especially chronic diseases. However, linkage between medication and diagnosis tend to be theory based and not very precise. Modelling disease probability at an individual level using individua...

Descripción completa

Detalles Bibliográficos
Autores principales:	Slobbe, Laurentius C J, Füssenich, Koen, Wong, Albert, Boshuizen, Hendriek C, Nielen, Markus M J, Polder, Johan J, Feenstra, Talitha L, van Oers, Hans A M
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Public Health Monitoring
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6660107/ https://www.ncbi.nlm.nih.gov/pubmed/30608539 http://dx.doi.org/10.1093/eurpub/cky270

_version_	1783439261374087168
author	Slobbe, Laurentius C J Füssenich, Koen Wong, Albert Boshuizen, Hendriek C Nielen, Markus M J Polder, Johan J Feenstra, Talitha L van Oers, Hans A M
author_facet	Slobbe, Laurentius C J Füssenich, Koen Wong, Albert Boshuizen, Hendriek C Nielen, Markus M J Polder, Johan J Feenstra, Talitha L van Oers, Hans A M
author_sort	Slobbe, Laurentius C J
collection	PubMed
description	BACKGROUND: Aggregated claims data on medication are often used as a proxy for the prevalence of diseases, especially chronic diseases. However, linkage between medication and diagnosis tend to be theory based and not very precise. Modelling disease probability at an individual level using individual level data may yield more accurate results. METHODS: Individual probabilities of having a certain chronic disease were estimated using the Random Forest (RF) algorithm. A training set was created from a general practitioners database of 276 723 cases that included diagnosis and claims data on medication. Model performance for 29 chronic diseases was evaluated using Receiver-Operator Curves, by measuring the Area Under the Curve (AUC). RESULTS: The diseases for which model performance was best were Parkinson’s disease (AUC = .89, 95% CI = .77–1.00), diabetes (AUC = .87, 95% CI = .85–.90), osteoporosis (AUC = .87, 95% CI = .81–.92) and heart failure (AUC = .81, 95% CI = .74–.88). Five other diseases had an AUC >.75: asthma, chronic enteritis, COPD, epilepsy and HIV/AIDS. For 16 of 17 diseases tested, the medication categories used in theory-based algorithms were also identified by our method, however the RF models included a broader range of medications as important predictors. CONCLUSION: Data on medication use can be a useful predictor when estimating the prevalence of several chronic diseases. To improve the estimates, for a broader range of chronic diseases, research should use better training data, include more details concerning dosages and duration of prescriptions, and add related predictors like hospitalizations.
format	Online Article Text
id	pubmed-6660107
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-66601072019-08-02 Estimating disease prevalence from drug utilization data using the Random Forest algorithm Slobbe, Laurentius C J Füssenich, Koen Wong, Albert Boshuizen, Hendriek C Nielen, Markus M J Polder, Johan J Feenstra, Talitha L van Oers, Hans A M Eur J Public Health Public Health Monitoring BACKGROUND: Aggregated claims data on medication are often used as a proxy for the prevalence of diseases, especially chronic diseases. However, linkage between medication and diagnosis tend to be theory based and not very precise. Modelling disease probability at an individual level using individual level data may yield more accurate results. METHODS: Individual probabilities of having a certain chronic disease were estimated using the Random Forest (RF) algorithm. A training set was created from a general practitioners database of 276 723 cases that included diagnosis and claims data on medication. Model performance for 29 chronic diseases was evaluated using Receiver-Operator Curves, by measuring the Area Under the Curve (AUC). RESULTS: The diseases for which model performance was best were Parkinson’s disease (AUC = .89, 95% CI = .77–1.00), diabetes (AUC = .87, 95% CI = .85–.90), osteoporosis (AUC = .87, 95% CI = .81–.92) and heart failure (AUC = .81, 95% CI = .74–.88). Five other diseases had an AUC >.75: asthma, chronic enteritis, COPD, epilepsy and HIV/AIDS. For 16 of 17 diseases tested, the medication categories used in theory-based algorithms were also identified by our method, however the RF models included a broader range of medications as important predictors. CONCLUSION: Data on medication use can be a useful predictor when estimating the prevalence of several chronic diseases. To improve the estimates, for a broader range of chronic diseases, research should use better training data, include more details concerning dosages and duration of prescriptions, and add related predictors like hospitalizations. Oxford University Press 2019-08 2019-01-03 /pmc/articles/PMC6660107/ /pubmed/30608539 http://dx.doi.org/10.1093/eurpub/cky270 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the European Public Health Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Public Health Monitoring Slobbe, Laurentius C J Füssenich, Koen Wong, Albert Boshuizen, Hendriek C Nielen, Markus M J Polder, Johan J Feenstra, Talitha L van Oers, Hans A M Estimating disease prevalence from drug utilization data using the Random Forest algorithm
title	Estimating disease prevalence from drug utilization data using the Random Forest algorithm
title_full	Estimating disease prevalence from drug utilization data using the Random Forest algorithm
title_fullStr	Estimating disease prevalence from drug utilization data using the Random Forest algorithm
title_full_unstemmed	Estimating disease prevalence from drug utilization data using the Random Forest algorithm
title_short	Estimating disease prevalence from drug utilization data using the Random Forest algorithm
title_sort	estimating disease prevalence from drug utilization data using the random forest algorithm
topic	Public Health Monitoring
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6660107/ https://www.ncbi.nlm.nih.gov/pubmed/30608539 http://dx.doi.org/10.1093/eurpub/cky270
work_keys_str_mv	AT slobbelaurentiuscj estimatingdiseaseprevalencefromdrugutilizationdatausingtherandomforestalgorithm AT fussenichkoen estimatingdiseaseprevalencefromdrugutilizationdatausingtherandomforestalgorithm AT wongalbert estimatingdiseaseprevalencefromdrugutilizationdatausingtherandomforestalgorithm AT boshuizenhendriekc estimatingdiseaseprevalencefromdrugutilizationdatausingtherandomforestalgorithm AT nielenmarkusmj estimatingdiseaseprevalencefromdrugutilizationdatausingtherandomforestalgorithm AT polderjohanj estimatingdiseaseprevalencefromdrugutilizationdatausingtherandomforestalgorithm AT feenstratalithal estimatingdiseaseprevalencefromdrugutilizationdatausingtherandomforestalgorithm AT vanoershansam estimatingdiseaseprevalencefromdrugutilizationdatausingtherandomforestalgorithm

Estimating disease prevalence from drug utilization data using the Random Forest algorithm

Ejemplares similares