Cargando…
A comparative study: classification vs. user-based collaborative filtering for clinical prediction
BACKGROUND: Recommender systems have shown tremendous value for the prediction of personalized item recommendations for individuals in a variety of settings (e.g., marketing, e-commerce, etc.). User-based collaborative filtering is a popular recommender system, which leverages an individuals’ prior...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5146891/ https://www.ncbi.nlm.nih.gov/pubmed/27931207 http://dx.doi.org/10.1186/s12874-016-0261-9 |
_version_ | 1782473574440763392 |
---|---|
author | Hao, Fang Blair, Rachael Hageman |
author_facet | Hao, Fang Blair, Rachael Hageman |
author_sort | Hao, Fang |
collection | PubMed |
description | BACKGROUND: Recommender systems have shown tremendous value for the prediction of personalized item recommendations for individuals in a variety of settings (e.g., marketing, e-commerce, etc.). User-based collaborative filtering is a popular recommender system, which leverages an individuals’ prior satisfaction with items, as well as the satisfaction of individuals that are “similar”. Recently, there have been applications of collaborative filtering based recommender systems for clinical risk prediction. In these applications, individuals represent patients, and items represent clinical data, which includes an outcome. METHODS: Application of recommender systems to a problem of this type requires the recasting a supervised learning problem as unsupervised. The rationale is that patients with similar clinical features carry a similar disease risk. As the “Big Data” era progresses, it is likely that approaches of this type will be reached for as biomedical data continues to grow in both size and complexity (e.g., electronic health records). In the present study, we set out to understand and assess the performance of recommender systems in a controlled yet realistic setting. User-based collaborative filtering recommender systems are compared to logistic regression and random forests with different types of imputation and varying amounts of missingness on four different publicly available medical data sets: National Health and Nutrition Examination Survey (NHANES, 2011-2012 on Obesity), Study to Understand Prognoses Preferences Outcomes and Risks of Treatment (SUPPORT), chronic kidney disease, and dermatology data. We also examined performance using simulated data with observations that are Missing At Random (MAR) or Missing Completely At Random (MCAR) under various degrees of missingness and levels of class imbalance in the response variable. RESULTS: Our results demonstrate that user-based collaborative filtering is consistently inferior to logistic regression and random forests with different imputations on real and simulated data. The results warrant caution for the collaborative filtering for the purpose of clinical risk prediction when traditional classification is feasible and practical. CONCLUSIONS: CF may not be desirable in datasets where classification is an acceptable alternative. We describe some natural applications related to “Big Data” where CF would be preferred and conclude with some insights as to why caution may be warranted in this context. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-016-0261-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5146891 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51468912016-12-15 A comparative study: classification vs. user-based collaborative filtering for clinical prediction Hao, Fang Blair, Rachael Hageman BMC Med Res Methodol Research Article BACKGROUND: Recommender systems have shown tremendous value for the prediction of personalized item recommendations for individuals in a variety of settings (e.g., marketing, e-commerce, etc.). User-based collaborative filtering is a popular recommender system, which leverages an individuals’ prior satisfaction with items, as well as the satisfaction of individuals that are “similar”. Recently, there have been applications of collaborative filtering based recommender systems for clinical risk prediction. In these applications, individuals represent patients, and items represent clinical data, which includes an outcome. METHODS: Application of recommender systems to a problem of this type requires the recasting a supervised learning problem as unsupervised. The rationale is that patients with similar clinical features carry a similar disease risk. As the “Big Data” era progresses, it is likely that approaches of this type will be reached for as biomedical data continues to grow in both size and complexity (e.g., electronic health records). In the present study, we set out to understand and assess the performance of recommender systems in a controlled yet realistic setting. User-based collaborative filtering recommender systems are compared to logistic regression and random forests with different types of imputation and varying amounts of missingness on four different publicly available medical data sets: National Health and Nutrition Examination Survey (NHANES, 2011-2012 on Obesity), Study to Understand Prognoses Preferences Outcomes and Risks of Treatment (SUPPORT), chronic kidney disease, and dermatology data. We also examined performance using simulated data with observations that are Missing At Random (MAR) or Missing Completely At Random (MCAR) under various degrees of missingness and levels of class imbalance in the response variable. RESULTS: Our results demonstrate that user-based collaborative filtering is consistently inferior to logistic regression and random forests with different imputations on real and simulated data. The results warrant caution for the collaborative filtering for the purpose of clinical risk prediction when traditional classification is feasible and practical. CONCLUSIONS: CF may not be desirable in datasets where classification is an acceptable alternative. We describe some natural applications related to “Big Data” where CF would be preferred and conclude with some insights as to why caution may be warranted in this context. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-016-0261-9) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-08 /pmc/articles/PMC5146891/ /pubmed/27931207 http://dx.doi.org/10.1186/s12874-016-0261-9 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Hao, Fang Blair, Rachael Hageman A comparative study: classification vs. user-based collaborative filtering for clinical prediction |
title | A comparative study: classification vs. user-based collaborative filtering for clinical prediction |
title_full | A comparative study: classification vs. user-based collaborative filtering for clinical prediction |
title_fullStr | A comparative study: classification vs. user-based collaborative filtering for clinical prediction |
title_full_unstemmed | A comparative study: classification vs. user-based collaborative filtering for clinical prediction |
title_short | A comparative study: classification vs. user-based collaborative filtering for clinical prediction |
title_sort | comparative study: classification vs. user-based collaborative filtering for clinical prediction |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5146891/ https://www.ncbi.nlm.nih.gov/pubmed/27931207 http://dx.doi.org/10.1186/s12874-016-0261-9 |
work_keys_str_mv | AT haofang acomparativestudyclassificationvsuserbasedcollaborativefilteringforclinicalprediction AT blairrachaelhageman acomparativestudyclassificationvsuserbasedcollaborativefilteringforclinicalprediction AT haofang comparativestudyclassificationvsuserbasedcollaborativefilteringforclinicalprediction AT blairrachaelhageman comparativestudyclassificationvsuserbasedcollaborativefilteringforclinicalprediction |